3 releases
new 0.1.2 | Jan 20, 2025 |
---|---|
0.1.1 | Jan 20, 2025 |
0.1.0 | Jan 20, 2025 |
#4 in #supervisor
69KB
1K
SLoC
ractor-supervisor
A pure-Rust supervisor built atop the ractor
framework. It provides OTP-style supervision trees so you can define how child actors should be restarted under different failure conditions, along with meltdown logic to prevent runaway restart loops.
Overview
ractor-supervisor
is inspired by the way Erlang/Elixir OTP handles actor supervision. It helps you:
- Define how you want to restart failing children—through different supervision strategies.
- Configure meltdown thresholds so that if too many restarts occur in a short period, the supervisor itself shuts down abnormally.
- Optionally add child-level backoff or meltdown reset intervals for even more granular control.
Installation
Add the following to your Cargo.toml
:
[dependencies]
ractor-supervisor = "0.1"
ractor = "0.14"
To get started, you should already have familiarity with ractor
. This crate builds on top of ractor
’s actor model.
SupervisorOptions
These options control the supervisor-wide meltdown logic and overall restart behavior:
-
strategy
Defines which children get restarted when any one child fails:- OneForOne: Only the failing child is restarted.
- OneForAll: If any child fails, all children are stopped and restarted.
- RestForOne: The failing child and all subsequent children (in definition order) are stopped and restarted.
-
max_restarts
+max_seconds
Meltdown window. Ifmax_restarts
is exceeded withinmax_seconds
, the supervisor triggers a meltdown and stops abnormally. -
restart_counter_reset_after
If the supervisor sees no restarts for this many seconds, it resets its meltdown counter back to zero. This prevents old failures from accumulating indefinitely.
ChildSpec
These specs define how each child actor is spawned and restarted. You provide:
id
: A unique identifier for the child (used in logs, meltdown tracking, etc.).restart
: One ofPermanent
(always restart),Transient
(only if fails abnormally), orTemporary
(never restart).spawn_fn
: A user-provided function that spawns (and links) the child actor; typically callsActor::spawn_linked
.backoff_fn
(optional): A function returning an extraDuration
delay before restarting this child (e.g., exponential backoff).restart_counter_reset_after
(optional): If the child remains alive for that many seconds, its own restart count is reset next time it fails.
Multi-Level Supervision Trees
Supervisors can manage other supervisors as children, forming a hierarchical or tree structure. This way, different subsystems can each have their own meltdown thresholds or strategies. A meltdown in one subtree doesn’t necessarily mean the entire application must go down, unless the top-level supervisor is triggered.
For example, you might have:
- Root Supervisor (OneForOne)
- Sub-supervisor A (OneForAll)
- Child actor #1
- Child actor #2
- Sub-supervisor B (RestForOne)
- Child actor #3
- Child actor #4
- Sub-supervisor A (OneForAll)
With nested supervision, you can isolate failures and keep the rest of your system running.
Usage
Below is a full code snippet showing how to configure and spawn the supervisor. We skip demonstrating the child actor implementation itself—assuming you already have one. Notice how we pass in a custom spawn_my_worker
function, define meltdown thresholds, and pick a specific restart strategy.
use ractor::Actor;
use ractor_supervisor::*; // assuming your crate is named ractor_supervisor
use std::{time::Duration, sync::Arc};
use tokio::time::Instant;
use futures_util::FutureExt;
// A minimal child actor that simply does some work in `handle`.
struct MyWorker;
#[ractor::async_trait]
impl Actor for MyWorker {
type Msg = ();
type State = ();
type Arguments = ();
// Called before the actor fully starts. We can set up the actor’s internal state here.
async fn pre_start(
&self,
_myself: ractor::ActorRef<Self::Msg>,
_args: Self::Arguments,
) -> Result<Self::State, ractor::ActorProcessingErr> {
Ok(())
}
// The main message handler. This is where you implement your actor’s behavior.
async fn handle(
&self,
_myself: ractor::ActorRef<Self::Msg>,
_msg: Self::Msg,
_state: &mut Self::State
) -> Result<(), ractor::ActorProcessingErr> {
// do some work...
Ok(())
}
}
// A function to spawn the child actor. This will be used in ChildSpec::spawn_fn.
async fn spawn_my_worker(
supervisor_cell: ractor::ActorCell,
child_id: String
) -> Result<ractor::ActorCell, ractor::SpawnErr> {
// We name the child actor using `child_spec.id` (though naming is optional).
let (child_ref, _join) = MyWorker::spawn_linked(
Some(child_id), // actor name
MyWorker, // actor instance
(), // arguments
supervisor_cell // link to the supervisor
).await?;
Ok(child_ref.get_cell())
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// A child-level backoff function that implements exponential backoff after the second failure.
// Return Some(delay) to make the supervisor wait before restarting this child.
let my_backoff: ChildBackoffFn = Arc::new(
|_child_id: &str, restart_count: usize, last_fail: Instant, child_reset_after: Option<u64>| {
// On the first failure, restart immediately (None).
// After the second failure, double the delay each time (exponential).
if restart_count <= 1 {
None
} else {
Some(Duration::from_secs(1 << restart_count))
}
}
);
// This specification describes exactly how to manage our single child actor.
let child_spec = ChildSpec {
id: "myworker".into(), // Unique identifier for meltdown logs and debugging.
restart: Restart::Transient, // Only restart if the child fails abnormally.
spawn_fn: Box::new(|cell, id| spawn_my_worker(cell, id).boxed()),
backoff_fn: Some(my_backoff), // Apply our custom exponential backoff on restarts.
// If the child remains up for 60s, its individual failure counter resets to 0 next time it fails.
restart_counter_reset_after: Some(60),
};
// Supervisor-level meltdown configuration. If more than 5 restarts occur within 10s, meltdown is triggered.
// Also, if we stay quiet for 30s (no restarts), the meltdown log resets.
let options = SupervisorOptions {
strategy: Strategy::OneForOne, // If one child fails, only that child is restarted.
max_restarts: 5, // Permit up to 5 restarts in the meltdown window.
max_seconds: 10, // The meltdown window (in seconds).
restart_counter_reset_after: Some(30), // If no failures for 30s, meltdown log is cleared.
};
// Group all child specs and meltdown options together:
let args = SupervisorArguments {
child_specs: vec![child_spec], // We only have one child in this example
options,
};
// Spawn the supervisor with our arguments.
let (sup_ref, sup_handle) = Actor::spawn(
None, // no name for the supervisor
Supervisor, // the Supervisor actor
args
).await?;
let _ = sup_ref.kill();
let _ = sup_handle.await;
Ok(())
}
License
This project is licensed under MIT. It is heavily inspired by Elixir/Erlang OTP patterns, but implemented in pure Rust for the ractor
framework
Dependencies
~5–12MB
~128K SLoC