#parallelism #thread #thread-pool #indexed #structures #slice #computation

paralight

A lightweight parallelism library for indexed structures

2 releases

0.0.2 Oct 1, 2024
0.0.1 Sep 17, 2024

#104 in Concurrency

Download history 152/week @ 2024-09-15 18/week @ 2024-09-22 171/week @ 2024-09-29 11/week @ 2024-10-06

352 downloads per month

MIT/Apache

105KB
2K SLoC

Paralight: a lightweight parallelism library for indexed structures

Crate Documentation Minimum Rust 1.75.0 Dependencies Lines of Code Build Status Test Status

This library allows you to distribute computation over slices among multiple threads. Each thread processes a subset of the items, and a final step reduces the outputs from all threads into a single result.

use paralight::{RangeStrategy, ThreadPoolBuilder};
use std::num::NonZeroUsize;

// Define thread pool parameters.
let pool_builder = ThreadPoolBuilder {
    num_threads: NonZeroUsize::try_from(4).unwrap(),
    range_strategy: RangeStrategy::WorkStealing,
};

// Create a scoped thread pool.
let sum = pool_builder.scope(
    |mut thread_pool| {
        // Compute the sum of a slice.
        let input = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
        thread_pool.pipeline(
            &input,
            /* init */ || 0u64,
            /* process_item */ |acc, _, x| *acc += *x,
            /* finalize */ |acc| acc,
            /* reduce */ |a, b| a + b,
        )
    },
);
assert_eq!(sum, 5 * 11);

Note: In principle, Paralight could be extended to support other inputs than slices as long as they are indexed, but for now only slices are supported. Come back to check when future versions are published!

Work-stealing strategy

Paralight offers two strategies to distribute computation among threads:

  • RangeStrategy::Fixed splits the input evenly and hands out a fixed sequential range of items to each thread,
  • RangeStrategy::WorkStealing starts with the fixed distribution, but lets each thread steal items from others once it is done computing its items.

Note: In work-stealing mode, each thread processes an arbitrary subset of items in arbitrary order, meaning that the reduction operation must be both commutative and associative to yield a deterministic result (in contrast with the standard library's Iterator trait that processes items in order). Fortunately, a lot of common operations are commutative and associative, but be mindful of this.

Debugging

Two optional features are available if you want to debug performance.

  • log, based on the log crate prints basic information about inter-thread synchronization: thread creation/shutdown, when each thread starts/finishes a computation, etc.
  • log_parallelism prints detailed traces about which items are processed by which thread, and work-stealing statistics (e.g. how many times work was stolen among threads).

Note that in any case neither the input items nor the resulting computation are logged. Only the indices of the items in the input may be present in the logs. If you're concerned that these indices leak too much information about your data, you need to make sure that you depend on Paralight with the log and log_parallelism features disabled.

Disclaimer

This is not an officially supported Google product.

Contributing

See CONTRIBUTING.md for details.

License

This software is distributed under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE for details.

Dependencies

~0–410KB