2 releases

0.1.1 Oct 28, 2022
0.1.0 Sep 7, 2022

#2570 in Rust patterns


Used in 2 crates (via fenris)

MIT/Apache

13KB
83 lines

davenport

davenport is a Rust microcrate that provides ergonomic thread-local workspaces for intermediate data.

use davenport::{define_thread_local_workspace, with_thread_local_workspace};

#[derive(Default)]
pub struct MyWorkspace {
    index_buffer: Vec<usize>
}

define_thread_local_workspace!(WORKSPACE);

fn median_floor(indices: &[usize]) -> Option<usize> {
    with_thread_local_workspace(&WORKSPACE, |workspace: &mut MyWorkspace| {
        // Re-use buffer from previous call to this function
        let buffer = &mut workspace.index_buffer;
        buffer.clear();
        buffer.copy_from_slice(&indices);
        buffer.sort_unstable();
        buffer.get(indices.len() / 2).copied()
    })
}

See the documentation for an in-depth explanation of the crate.

License

Licensed under the terms of both MIT and Apache 2.0 at your option. See LICENSE-MIT and LICENSE-APACHE for the detailed license text.


lib.rs:

Ergonomic thread-local workspaces for intermediate data.

davenport is a microcrate with a simple API for working with thread-local data, like buffers for intermediate data. Here's a brief example of the davenport API:

use davenport::{define_thread_local_workspace, with_thread_local_workspace};

#[derive(Default)]
pub struct MyWorkspace {
    index_buffer: Vec<usize>
}

define_thread_local_workspace!(WORKSPACE);

fn median_floor(indices: &[usize]) -> Option<usize> {
    with_thread_local_workspace(&WORKSPACE, |workspace: &mut MyWorkspace| {
        // Re-use buffer from previous call to this function
        let buffer = &mut workspace.index_buffer;
        buffer.clear();
        buffer.copy_from_slice(&indices);
        buffer.sort_unstable();
        buffer.get(indices.len() / 2).copied()
    })
}

Thread local storage should be used with care. In the above example, if indices is large, then a large buffer may be allocated and not freed for the duration of the program. Since stand-alone functions that use thread local storage rarely have enough information to know whether the buffer should be kept alive or not, this may easily lead to unnecessary and redundant memory use.

Try to find other solutions before reaching for thread-local data!

Motivating example

Let's say we have to compute the sum of a series of elements that are produced in variable-sized "chunks" by a Producer. For a fixed element type like u32, our code might for example look like this:

pub trait Producer {
    fn num_elements(&self) -> usize;
    fn populate_buffer(&self, buffer: &mut [u32]);
}

fn compute_sum(producer: &dyn Producer) -> u32 {
    let mut buffer = vec![u32::MAX; producer.num_elements()];
    producer.populate_buffer(&mut buffer);
    buffer.iter().sum()
}

If we call this method over and over again, it might be prudent to try to avoid the constant re-allocation of the vector. Ideally we'd be able to store some persistent buffer in one of the function arguments, or have compute_sum be a method on an object with an internal buffer. However, sometimes we do not have this luxury, perhaps if we're constrained to fit into an existing API that does not allow for buffers to be passed in. An alternative then might be to store the buffer in thread-local storage. Using thread-local storage, the above compute_sum function might look like this:

fn compute_sum(producer: &dyn Producer) -> u32 {
    thread_local! { static BUFFER: std::cell::RefCell<Vec<u32>> = Default::default(); }
    BUFFER.with(|rc| {
        let mut buffer = rc.borrow_mut();
        producer.populate_buffer(&mut *buffer);
        buffer.iter().sum()
    })
}

Now, let's imagine that we wanted our function to work with a more generic set of types, rather than u32 alone. We generalize the Producer trait, but quickly realize that we cannot create a thread_local! buffer in the same way.

use std::ops::{Add, AddAssign};

pub trait Producer<T> {
   fn num_elements(&self) -> usize;
   fn populate_buffer(&self, buffer: &mut [T]);
}

fn compute_sum<T>(producer: &dyn Producer<T>) -> T
where
    T: 'static + Default + Copy + std::iter::Sum
{
    // Does not compile!
    //  error[E0401]: can't use generic parameters from outer function
    thread_local! { static BUFFER: std::cell::RefCell<Vec<T>> = Default::default(); }
    BUFFER.with(|rc| {
        let mut buffer = rc.borrow_mut();
        buffer.resize(producer.num_elements(), T::default());
        producer.populate_buffer(&mut *buffer);
        buffer.iter()
              .copied()
              .sum()
    })
}

It turns out that it's generally difficult to construct a thread local workspace that's generic in its type. However, we can do this with davenport:

use davenport::{define_thread_local_workspace, with_thread_local_workspace};
use std::ops::{Add, AddAssign};

#
fn compute_sum<T>(producer: &dyn Producer<T>) -> T
where
    T: 'static + Default + Copy + std::iter::Sum
{
    define_thread_local_workspace!(WORKSPACE);
    with_thread_local_workspace(&WORKSPACE, |buffer: &mut Vec<T>| {
        buffer.resize(producer.num_elements(), T::default());
        producer.populate_buffer(buffer);
        buffer.iter()
              .copied()
              .sum()
    })
}

davenport gets around the aforementioned restrictions because the actual thread-local variable is an instance of Workspace, which is a container for type-erased work spaces. Thus, what is really happening in the example above is that a thread-local Workspace type is constructed, which we ask for a mutable reference to Vec<T>. If the buffer does not yet exist, it is default-constructed. Otherwise we obtain a previously-used instance.

Limitations

Currently, trying to access the same workspace variable (WORKSPACE in the above examples) recursively with with_thread_local_workspace will panic, as it relies on mutably borrowing through RefCell. While this restriction could technically be lifted at the cost of increased complexity in davenport, it rarely arises in practice when using sufficiently local workspaces, as opposed to sharing a single workspace variable across entire crates.

No runtime deps