Supervision
Startup
Once the system is started, the system.init
actor starts all actors marked as entrypoint()
in the topology. Usually, it's only the system.configurers
group. Entry points must have empty configs.
Then, system.configurers
loads the config file and sends UpdateConfig
to all actor groups. This message is usually used to start any remaining actors in the system. Thus, you need to define routing for UpdateConfig
with either Outcome::Unicast
or Outcome::Multicast
for actor keys you want to start on startup.
Note that the actor system terminates immediately if the config file is invalid at startup (elfo::start()
returns an error).
Reconfiguration
Reconfiguration can be caused in several cases:
- The configurer receives
ReloadConfigs
orTryRealodConfigs
. - The process receives
SIGHUP
, it's an equivalent to receivingReloadConfigs::default()
. - The process receives
SIGUSR2
, it's an equivalent to receivingReloadConfigs::default().with_force(true)
.
What's the difference between default behavior and with_force(true)
one? By default, group's up-to-date configs aren't sent across the system. In the force mode, all configs are updated, even unchanged ones.
The reconfiguration process consists of several stages:
- Config validation: the configurer sends the
ValidateConfig
request to all groups and waits for responses. If all groups respondOk(_)
or discard the request (which is the default behaviour), the config is considered valid and the configurer proceeds to the next step. - Config update: the configurer sends
UpdateConfig
to all groups. Note that it is sent as a regular message, rather than request, making config update asynchronous. System does not wait for all configs to apply before finishing this stage and it is possible for an actor to fail while applying the new config. Despite actor failures, any further actors spawned (or restarted) will have the new config available for them.
More information about configs and the reconfiguration process is available on the corresponding page.
Restart
Restart policies
Three restart policies are implemented now:
RestartPolicy::never
(by default) — actors are never restarted. However, they can be started again on incoming messages.RestartPolicy::on_failure
— actors are restarted only after failures (theexec
function returnedErr
or panicked).RestartPolicy::always
— actors are restarted after termination (theexec
function returnedOk
or()
) and failures.
If the actor is scheduled to be restarted, incoming messages cannot spawn another actor for his key.
The restart policy can be chosen while creating an ActorGroup
. This is the default restart policy for the group, which will be used if not overridden.
use elfo::{RestartPolicy, RestartParams, ActorGroup};
ActorGroup::new().restart_policy(RestartPolicy::always(RestartParams::new(...)))
ActorGroup::new().restart_policy(RestartPolicy::on_failure(RestartParams::new(...)))
ActorGroup::new().restart_policy(RestartPolicy::never())
The group restart policy can be overridden separately for each actor group in the configuration:
[group_a.system.restart_policy]
when = "Never" # Override group policy on `config_update` with `RestartPolicy::never()`.
[group_b.system.restart_policy]
# Override the group policy on `config_update` with `RestartPolicy::always(...)`.
when = "Always" # or "OnFailure".
# Configuration details for the RestartParams of the backoff strategy in the restart policy.
# The parameters are described further in the chapter.
min_backoff = "5s" # Required.
max_backoff = "30s" # Required.
factor = "2.0" # Optional.
max_retries = "20" # Optional.
auto_reset = "5s" # Optional.
[group_c]
# If the restart policy is not specified, the restart policy from the blueprint will be used.
Additionally, each actor can further override its restart policy through the actor's context:
// Override the group policy with the actor's policy, which is applicable only for this lifecycle.
ctx.set_restart_policy(RestartPolicy::...);
// Restore the group policy from the configuration (if specified), or use the default group policy.
ctx.set_restart_policy(None);
Repetitive restarts
Repetitive restarts are limited by an exponential backoff strategy. The delay for each subsequent retry is calculated as follows:
delay = min(min_backoff * pow(factor, power), max_backoff)
power = power + 1
So for example, when factor
is set to 2
, min_backoff
is set to 4
, and max_backoff
is set to 36
, and power
is set to 0
,
this will result in the following sequence of delays: [4, 8, 16, 32, 36]
.
The backoff strategy is configured by passing a RestartParams
to one of the restart policies, either RestartPolicy::on_failure
or RestartPolicy::always
, upon creation.
The required parameters for RestartParams
are as follows:
min_backoff
- Minimal restart time limit.max_backoff
- Maximum restart time limit.
The RestartParams
can be further configured with the following optional parameters:
factor
- The value to multiply the current delay with for each retry attempt. Default value is2.0
.max_retries
- The limit on retry attempts, after which the actor stops attempts to restart. The default value isNonZeroU64::MAX
, which effectively means unlimited retries.auto_reset
- The duration of an actor's lifecycle sufficient to deem the actor healthy. The default value ismin_backoff
. After this duration elapses, the backoff strategy automatically resets. The following restart will occur without delays and will be considered the first retry. Subsequent restarts will have delays calculated using the formula above, with thepower
starting from 0.
Termination
System termination is started by the system.init
actor in several cases:
- Received
SIGTERM
orSIGINT
signals, Unix only. - Received
CTRL_C_EVENT
orCTRL_BREAK_EVENT
events, Windows only. - Too high memory usage, Unix only. Now it's 90% (ratio of RSS to total memory) and not configurable (elfo#60).
The process consists of several stages:
system.init
sends to all user-defined groups "polite"Terminate::default()
message.- Groups' supervisors stop spawning new actors.
Terminate
is routed as a usual message withOutcome::Broadcast
by default.- For
TerminationPolicy::closing
(by default): the mailbox is closed instantly,ctx.recv()
returnsNone
after already stored messages. - For
TerminationPolicy::manually
: the mailbox isn't closed,ctx.recv()
returnsTerminate
in order to handle in on your own. Usectx.close()
to close the mailbox.
- If some groups haven't terminated after 30s,
system.init
sendsTerminate::closing()
message.Terminate
is routed as a usual message withOutcome::Broadcast
by default.- For any policy, the mailbox is closed instantly,
ctx.recv()
returnsNone
after already stored messages.
- If some groups haven't terminated after another 15s,
system.init
stops waiting for that groups. - Repeat the above stages for system groups.
Timeouts above aren't configurable for now (elfo#61).
The termination policy can be chosen while creating ActorGroup
:
use elfo::group::{TerminationPolicy, ActorGroup};
ActorGroup::new().termination_policy(TerminationPolicy::closing())
ActorGroup::new().termination_policy(TerminationPolicy::manually())