4 releases (breaking)
0.4.0 | Aug 28, 2024 |
---|---|
0.3.0 | Apr 27, 2024 |
0.2.2 | Apr 27, 2024 |
0.1.0 | Apr 5, 2024 |
#345 in Concurrency
Used in fluxion
19KB
152 lines
About
Slacktor is an extremely performant actor library. It supports no-std, has only one dependency, and is extremely simple and portable. In optimal conditions, it runs around 700 million messages/second to a single actor with no other dependencies, as seen in the simple
example (which does use the rand
crate just to force the compiler to not optimize the code away). Because Slacktor does not handle synchronization, utilizing rayon it is possible to acheive roughly 4.5 billion messages/second on an i9-13900H laptop CPU, as seen in the parallel
example.
Slacktor has such a low overhead, that simply changing the u64s in parallel
to u32's in parallel_u32
provides a speedup to roughly 9 billion messages/second, and reducing them to a u8 in parallel_u8
increases the speed to 21 billion messages/second. The example no_slacktor
is the equivalent of parallel_u8
without Slacktor, and is capable of 22 billion messages/second. Reducing allocations using an iterator method like sum
instead of collect
, I have measured up to 80 billion messages/second.
Limitations and Disclaimers
Slacktor actors do not have what would be called "contexts" in other actor frameworks, a window to the outside world that allows them to interact with existing actors. It is up to the user to provide this, be it as a RwLock
/Mutex
of an Arc
referencing the Slacktor instance, or through message passing. Slacktor is focused on providing a simple and performant core for actor based systems, with minmal dependencies.
Slacktor doesn't actually use message passing. Instead, it emulates message passing on top of raw function calls. This allows for extremely high performance, maintains the advantages of other actor frameworks, and provides users with complete control over their project structure. You can use Slacktor as much or as little as you want.
How is Slacktor So Fast?
Slacktor doesn't try to handle any synchronization, concurrency, or message passing. Instead, Slacktor provides a simple abstraction over a slab of actors. Message passing is then emulated by calling the message handler as soon as send
is called. This allows the compiler to essentially optimize away the entirery of Slacktor down to just a few function calls, while still maintaining the abstraction of message passing.
Benchmarks
On my laptop (i9-13900H, 32GB RAM), the following code outputs roughly 700,000,000 messages/second:
use std::time::Instant;
use slacktor::{
actor::{Actor, Handler, Message},
Slacktor,
};
struct TestMessage(pub u64);
impl Message for TestMessage {
type Result = u64;
}
struct TestActor(pub u64);
impl Actor for TestActor {
fn destroy(&self) {
println!("destroying");
}
}
impl Handler<TestMessage> for TestActor {
fn handle_message(&self, m: TestMessage) -> u64 {
m.0 ^ self.0
}
}
fn main() {
// Create a slacktor instance
let mut system = Slacktor::new();
// Create a new actor
let actor_id = system.spawn(TestActor(rand::random::<u64>()));
// Get a reference to the actor
let a = system.get::<TestActor>(actor_id).unwrap();
// Time 1 billion messages, appending each to a vector and doing some math to prevent the
// code being completely optimzied away.
let num_messages = 1_000_000_000;
let mut out = Vec::with_capacity(num_messages);
let start = Instant::now();
for i in 0..num_messages {
// Send the message
let v = a.send(TestMessage(i as u64));
out.push(v);
}
let elapsed = start.elapsed();
println!(
"{:.2} messages/sec",
num_messages as f64 / elapsed.as_secs_f64()
);
system.kill(actor_id);
}
Moving the system.get
call into the loop drops it to roughly 400,000,000 messages/sec:
// Create a slacktor instance
let mut system = Slacktor::new();
// Create a new actor
let actor_id = system.spawn(TestActor(rand::random::<u64>()));
// Time 1 billion messages, appending each to a vector and doing some math to prevent the
// code being completely optimzied away.
let num_messages = 1_000_000_000;
let mut out = Vec::with_capacity(num_messages);
let start = Instant::now();
for i in 0..num_messages {
// Retrieve the actor from the system and send a message
let v = system.get::<TestActor>(actor_id).unwrap().send(TestMessage(i as u64));
out.push(v);
}
let elapsed = start.elapsed();
println!(
"{:.2} messages/sec",
num_messages as f64 / elapsed.as_secs_f64()
);
system.kill(actor_id);
If we remove pushing the values to the vector (and retrieve the actor reference outside of the loop), the rust compiler is able to completely optimize away the loop, and the code finishes executing in 100ns:
// Create a slacktor instance
let mut system = Slacktor::new();
// Create a new actor
let actor_id = system.spawn(TestActor(rand::random::<u64>()));
// Get a reference to the actor
let a = system.get::<TestActor>(actor_id).unwrap();
// Time 1 billion messages, appending each to a vector and doing some math to prevent the
// code being completely optimzied away.
let num_messages = 1_000_000_000;
let start = Instant::now();
for i in 0..num_messages {
// Send the message
a.send(TestMessage(i as u64));
}
let elapsed = start.elapsed();
println!(
"{:?}",
elapsed
);
system.kill(actor_id);
Retrieving the actor reference inside of the loop in this case gives us roughly 600,000,000 messages/second.
The following equivalent code for the Actix framework can handle roughly 400,000 messages/second, and does not allow the Rust compiler to optimize away second loop. I have reduced the number of messages to 1 million, as 1 billion is too much for Actix to handle in a reasonable timeframe.
use std::time::Instant;
use actix::prelude::*;
#[derive(Message)]
#[rtype(u64)]
struct TestMessage(pub u64);
// Actor definition
struct TestActor(pub u64);
impl Actor for TestActor {
type Context = Context<Self>;
}
// now we need to implement `Handler` on `Calculator` for the `Sum` message.
impl Handler<TestMessage> for TestActor {
type Result = u64; // <- Message response type
fn handle(&mut self, msg: TestMessage, _ctx: &mut Context<Self>) -> Self::Result {
msg.0 ^ self.0
}
}
#[actix::main]
async fn main() {
let actor = TestActor(rand::random::<u64>()).start();
let num_messages = 1_000_000;
let mut out = Vec::with_capacity(num_messages);
let start = Instant::now();
for i in 0..num_messages {
let a = actor.send(TestMessage(i as u64)).await.unwrap();
out.push(a);
}
let elapsed = start.elapsed();
println!("{:.2} messages/sec", num_messages as f64/elapsed.as_secs_f64());
// Actix won't optimize away
let num_messages = 1_000_000;
let start = Instant::now();
for i in 0..num_messages {
let _a = actor.send(TestMessage(i as u64)).await.unwrap();
}
let elapsed = start.elapsed();
println!("{:.2} messages/sec", num_messages as f64/elapsed.as_secs_f64());
}
All of these tests were run with cargo --release
, Cargo version 1.75.0
and rustc version 1.75.0
with lto enabled (to minimal effect).
It is safe to say that Slacktor introduces almost no overhead to any projects that use it.
Additionally, Slacktor is entirely parallelizable, so the following code utilizing Rayon is capable of acheiving roughly 4.5 billion messages per second:
// Create a slacktor instance
let mut system = Slacktor::new();
// Create a new actor
let actor_id = system.spawn(TestActor(rand::random::<u64>()));
// Get a reference to the actor
let a = system.get::<TestActor>(actor_id).unwrap();
// Time 1 billion messages, appending each to a vector and doing some math to prevent the
// code being completely optimzied away.
let num_messages = 1_000_000_000;
let start = Instant::now();
let _v = (0..num_messages).into_par_iter().map(|i| {
// Send the message
a.send(TestMessage(i as u64))
}).collect::<Vec<_>>();
let elapsed = start.elapsed();
println!(
"{:.2} messages/sec",
num_messages as f64 / elapsed.as_secs_f64()
);
system.kill(actor_id);
Retrieving the actor reference inside of the loop leads to a lower speed of roughly 3 billion messages/second. The parallel_u8
example is capable of acheiving roughly 21 billion messages per second.
Dependencies
~45KB