Telemetry
Introduction
TODO: metrics
crate, metric types.
All metrics are provided with the actor_group
and, optionally, actor_key
labels. The last one is added for actor groups with enabled system.telemetry.per_actor_key
option.
Read more information about metric types.
TODO: tips, prefer increment_gauge!
over gauge!
Configuration
Telemetry can be configured separately for each actor group. Possible options and their default values:
[some_group]
system.telemetry.per_actor_group = true
system.telemetry.per_actor_key = false
Note that using per_actor_key
can highly increase a number of metrics. Use it only for low cardinality groups.
TODO: elfo-telemeter
config.
Built-in metrics
elfo
is shipped with a lot of metrics. All of them start with the elfo_
prefix to avoid collisions with user defined metrics.
Statuses
-
Gauge
elfo_active_actors{status}
The number of active actors in the specified status.
-
Gauge
elfo_restarting_actors
The number of actors that will be restarted after some time.
-
Counter
elfo_actor_status_changes_total{status}
The number of transitions into the specified status.
Messages
-
Counter
elfo_sent_messages_total{message, protocol}
The number of sent messages.
-
Summary
elfo_message_handling_time_seconds{message, protocol}
Spent time on handling the message, measured between
(try_)recv()
calls. Used to detect slow handlers. -
Summary
elfo_message_waiting_time_seconds
Elapsed time between
send()
and correspondingrecv()
calls. Usually it represents a time that a message spends in a mailbox. Used to detect places that should be sharded to reduce a total latency. -
Summary
elfo_busy_time_seconds
Spent time on polling a task with an actor. More precisely, the time for which the task executor is blocked. Equals to CPU time if blocking IO isn't used.
Log events
-
Counter
elfo_emitted_events_total{level}
The number of emitted events per level (
Error
,Warn
,Info
,Debug
,Trace
). -
Counter
elfo_limited_events_total{level}
The number of events that haven't been emitted because the limit was reached.
-
Counter
elfo_lost_events_total{level}
The number of events that hasn't been emitted because the event storage is full.
Dump events
-
Counter
elfo_emitted_dumps_total
The number of emitted dumps.
-
Counter
elfo_limited_dumps_total
The number of dumps that haven't been emitted because the limit was reached.
-
Counter
elfo_lost_dumps_total
The number of dumps that hasn't been emitted because the dump storage is full.
Other metrics
TODO: specific to elfo-logger, elfo-dumper, elfo_telemeter
Derived metrics
Statuses
TODO
Incoming/outgoing rate
TODO
rate(elfo_message_handling_time_seconds_count{actor_group="${actor_group:raw}",actor_key=""}[$__rate_interval])
Waiting time
TODO
rate(elfo_message_waiting_time_seconds{actor_group="${actor_group:raw}",actor_key="",quantile=~"0.75|0.9|0.95"}[$__rate_interval])
Utilization
TODO
rate(elfo_message_handling_time_seconds_sum{actor_group="${actor_group:raw}",actor_key=""}[$__rate_interval])
Executor utilization (≈ CPU usage)
TODO
The time for which the task executor is blocked. Equals to CPU time if blocking IO isn't used.
rate(elfo_busy_time_seconds_sum[$__rate_interval])
Dashboards
TODO
Implementation details
TODO