#parallel #performance #join #thread

rayon-scan

A parallel prefix scan function for ParallelIterator

2 releases

0.1.1 Feb 2, 2024
0.1.0 Jan 21, 2024

#416 in Concurrency

Download history 3521/week @ 2024-09-22 7451/week @ 2024-09-29 8895/week @ 2024-10-06 5846/week @ 2024-10-13 5661/week @ 2024-10-20 3567/week @ 2024-10-27 4061/week @ 2024-11-03 2759/week @ 2024-11-10 3180/week @ 2024-11-17 3830/week @ 2024-11-24 4560/week @ 2024-12-01 5788/week @ 2024-12-08 5011/week @ 2024-12-15 2140/week @ 2024-12-22 5494/week @ 2024-12-29 7467/week @ 2025-01-05

20,315 downloads per month
Used in 22 crates (2 directly)

MIT/Apache

13KB
219 lines

rayon-scan

Current Version Documentation License: MIT/Apache-2.0

This crate provides a parallel version of the Iterator scan method, on Rayon's ParallelIterator.

Scan is a higher-order function which is similar to fold, but accumulates the intermediate results at each step. Specifically, the nth element of the scan iterator is the result of reducing the first n elements of the input with the given operation.

The main difference of parallel scan is that the operator must be associative. In a sequential scan, the operation is applied left-to-right on the input, but in a parallel scan, the order is unspecified.

Usage

// Iterate over a sequence of numbers `x0, ..., xN`
// and use scan to compute the partial sums
use rayon::prelude::*;
use rayon_scan::ScanParallelIterator;

let partial_sums = [1, 2, 3, 4, 5]
                    .into_par_iter()       // iterating over i32
                    .scan(|a, b| *a + *b,  // add (&i32, &i32) -> i32
                          0)               // identity
                    .collect::<Vec<i32>>();
assert_eq!(partial_sums, vec![1, 3, 6, 10, 15]);

Performance

For a regular prefix sum or product on ints, the parallel overhead is too much to see any improvement with the parallel version. However, sufficently complex operations such as large matrix multiplications can see large performance benefits.

In order to maximize performance, it is a good idea to limit the amount of splitting, for example by using .with_min_len(). Parallel scan has a sequential section which takes linear time in the number of splits.

See https://github.com/rayon-rs/rayon/pull/1036/ for more details on implementation and performance.

Tests and Benchmarks

To run tests:

cargo test

To run benchmarks:

cargo +nightly test --features "bench"

License

Licensed under Apache 2.0 and MIT.

[!NOTE] https://github.com/rayon-rs/rayon/pull/1036/ is open to merge this feature to Rayon, and this crate should become obsolete if it merges.

Dependencies

~1.5MB
~25K SLoC