#rayon #tls #threads


thread local contexts for rayon loops

4 releases

0.2.0 May 24, 2021
0.1.2 May 24, 2021
0.1.1 May 21, 2021
0.1.0 May 21, 2021

#234 in Concurrency

Used in 2 crates (via scanflow)

MIT license

154 lines


Crates.io API Docs Build and test MIT licensed

Thread local variables for Rayon thread pools

This crate provides a simple ThreadLocalCtx struct that allows to store efficient thread-local state that gets built by a lambda.

It is incredibly useful in multithreaded processing, where a context needs to be used that is expensive to clone. In the end, there will be no more clones occuring than number of threads in a rayon thread pool.


Clone only when it's necessary

This library provides an efficient way to clone values in a rayon thread pool, but usually just once per thread. It cuts down on computation time for potentially expensive cloning operations.

Additional clones can rarely occur when rayon schedules execution of another instance of the same job, recursively. But in the end, there should not be more than 2N clones, for N threads.


use rayon_tlsctx::ThreadLocalCtx;
use rayon::iter::*;

const NUM_COPIES: usize = 16;

let mut buf: Vec<u16> = (0..!0).collect();

// Create a thread local context with value 0.
let ctx = ThreadLocalCtx::new(|| {
    // Simulate expensive operation.
    // Since we are building unlocked context,
    // the sleeps will occur concurrently.

let pool = rayon::ThreadPoolBuilder::new().num_threads(64).build().unwrap();

// Run inside a custom thread pool.
pool.install(|| {
    // Sum the buffer `NUM_COPIES` times and accumulate the results
    // into the threaded pool of counts. Note that the counts may be
    // Unevenly distributed.
        .flat_map(|_| buf.par_iter())
        .for_each(|i| {
            let mut cnt = unsafe { ctx.get() };
            *cnt += *i as usize;

let buf_sum = buf.into_iter().fold(0, |acc, i| acc + i as usize);

// What matters is that the final sum matches the expected value.
assert_eq!(ctx.into_iter().sum::<usize>(), buf_sum * NUM_COPIES);


~26K SLoC