10 releases (stable)

1.3.1 Jan 29, 2024
1.3.0 Jan 23, 2024
0.1.2 Jan 10, 2024

#234 in Concurrency

Download history 151/week @ 2024-01-22 355/week @ 2024-01-29 6/week @ 2024-02-05 63/week @ 2024-02-12 7/week @ 2024-02-19 50/week @ 2024-02-26 33/week @ 2024-03-04 74/week @ 2024-03-11 6/week @ 2024-03-18 8/week @ 2024-04-01 60/week @ 2024-04-29 12/week @ 2024-05-06

72 downloads per month
Used in roster


215 lines


release Crates.io version dependency status docs.rs docs PRs Welcome

"Application tail latency is critical for services to meet their latency expectations. We have shown that the thread-per-core approach can reduce application tail latency of a key-value store by up to 71% compared to baseline Memcached running on commodity hardware and Linux."[^1]

[^1]: The Impact of Thread-Per-Core Architecture on Application Tail Latency


This library is mainly made for io-uring and monoio. There are no dependency on the runtime, so you should be able to use it with other runtime and also without io-uring.

The purpose of this library is to have a performant way to send data between thread when threads are following a thread per core architecture. Even if the aim is to be performant remember it's a core to core passing, (or thread to thread), which is really slow.

Thanks to Glommio for the inspiration.


Originally, the library was made when you had multiple thread listening to the same TcpStream and depending on what is sent through the TcpStream you might want to change the thread handling the stream.

You can check some examples in the tests.


Those benchmarks are only indicative, they are running in GA. You should run your own on the targeted hardware.

It shows that sharded-thread based on utility.sharded_queue is faster (~6%) than if we built the mesh based on flume.

Flume vs Sharded-thread for sharded-thread - Bencher



Licensed under either of


~36K SLoC