### 11 stable releases

1.11.0 | Sep 9, 2024 |
---|---|

1.10.0 | Sep 5, 2024 |

1.7.0 | Aug 29, 2024 |

1.3.0 | Jul 25, 2024 |

#**220** in Concurrency

**63** downloads per month

**MIT**license

225KB

4K
SLoC

# orx-parallel

A performant and configurable parallel computing library for computations defined as compositions of iterator methods.

## Parallel Computation by Iterators

Parallel computation is achieved conveniently by the parallel iterator trait

. This allows for changing sequential code that is defined as a composition of functions through iterators into its parallel counterpart by adding one word: `Par`

or `par`

.`into_par`

`use` `orx_parallel``::``prelude``::``*``;`
`struct` `Input``(`String`)``;`
`struct` `Output``(``usize``)``;`
`let` `compute` `=` `|``input``:` Input`|` `Output``(`input`.``0.``len``(``)``)``;`
`let` `select` `=` `|``output``:` `&`Output`|` `output``.``0.``is_power_of_two``(``)``;`
`let` `inputs` `=` `|``|` `(``0``..``1024``)``.``map``(``|``x``|` Input`(`x`.``to_string``(``)``)``)``.``collect``::``<``Vec``<``_``>``>``(``)``;`
`//` sequential computation with regular iterator
`let` seq_result`:` `usize` `=` `inputs``(``)`
`.``into_iter``(``)`
`.``map``(`compute`)`
`.``filter``(`select`)`
`.``map``(``|``x``|` `x``.``0``)`
`.``sum``(``)``;`
`assert_eq!``(`seq_result`,` `286``)``;`
`//` parallel computation with Par
`let` par_result `=` `inputs``(``)`
`.``into_par``(``)` `//` parallelize with default settings
`.``map``(`compute`)`
`.``filter``(`select`)`
`.``map``(``|``x``|` `x``.``0``)`
`.``sum``(``)``;`
`assert_eq!``(`par_result`,` `286``)``;`

Below code block includes some basic examples demonstrating different sources providing references or values as inputs of the parallel computation.

`use` `orx_parallel``::``prelude``::``*``;`
`use` `std``::``collections``::``*``;`
`fn` `test``<`P`:` Par`<`Item = `usize``>``>``(``iter``:` P`)`` ``{`
`let` result `=` iter`.``filter``(``|``x``|` `x ``%` `2` `==` `1``)``.``map``(``|``x``|` `x ``+` `1``)``.``sum``(``)``;`
`assert_eq!``(``6``,` result`)``;`
`}`
`let` range `=` `1``..``4``;`
`test``(`range`.``par``(``)``)``;`
`let` vec `=` `vec!``[``1``,` `2``,` `3``]``;`
`test``(`vec`.``par``(``)``.``copied``(``)``)``;` `//` use a ref to vec
`test``(`vec`.``into_par``(``)``)``;` `//` consume vec
`//` other collections can be used similarly
`let` set`:` `HashSet``<``_``>` `=` `[``1``,` `2``,` `3``]``.``into_iter``(``)``.``collect``(``)``;`
`test``(`set`.``par``(``)``.``copied``(``)``)``;`
`test``(`set`.``into_par``(``)``)``;`
`let` bmap`:` `BTreeMap``<``_`, `_``>` `=` `[``(``'`a`'``,` `1``)``,` `(``'`b`'``,` `2``)``,` `(``'`c`'``,` `3``)``]``.``into_iter``(``)``.``collect``(``)``;`
`test``(`bmap`.``par``(``)``.``map``(``|``x``|` `x``.``1``)``.``copied``(``)``)``;`
`test``(`bmap`.``into_par``(``)``.``map``(``|``x``|` `x``.``1``)``)``;`
`//` any regular/sequential iterator can be parallelized
`let` iter `=` `[``"``"``,` `"`a`"``,` `"`bb`"``,` `"`ccc`"``,` `"`dddd`"``]`
`.``iter``(``)`
`.``skip``(``1``)`
`.``take``(``3``)`
`.``map``(``|``x``|` `x``.``len``(``)``)``;`
`test``(`iter`.``par``(``)``)``;`

## Easy to Configure

Complexity of distribution of work to parallel threads is boiled down to two straightforward parameters which are easy to reason about:

represents the degree of parallelization. It can be set to one of the two variants:`NumThreads`

: All threads will be assumed to be available. This is an upper bound; whenever the computation is not sufficiently challenging, this number may not be reached.`Auto`

: The computation can spawn at most n threads. NumThreads::Max(1) is equivalent to sequential execution.`Max``(`n`)`

represents the number of elements a parallel worker will pull and process every time it becomes idle. This parameter aims to balance the overhead of parallelization and cost of heterogeneity of tasks. It can be set to one of the three variants:`ChunkSize`

: The library aims to select the best value in order to minimize computation time.`Auto`

: Chunk sizes will be c. This variant gives the control completely to the caller, and hence, suits best to computations to be tuned.`Exact``(`c`)`

: Chunk sizes will be at least c. However, the execution is allowed to pull more elements depending on characteristics of the inputs and used number of threads in order to reduce the impact of parallelization overhead.`Min``(`c`)`

`use` `orx_parallel``::``prelude``::``*``;`
`use` `std``::``num``::`NonZeroUsize`;`
`let` `_` `=` `(``0``..``42``)``.``par``(``)``.``sum``(``)``;` `//` both settings at Auto
`let` `_` `=` `(``0``..``42``)``.``par``(``)``.``num_threads``(``4``)``.``sum``(``)``;` `//` at most 4 threads
`let` `_` `=` `(``0``..``42``)``.``par``(``)``.``num_threads``(``1``)``.``sum``(``)``;` `//` sequential
`let` `_` `=` `(``0``..``42``)``.``par``(``)``.``num_threads``(``NumThreads``::`sequential`(``)``)``.``sum``(``)``;` `//` also sequential
`let` `_` `=` `(``0``..``42``)``.``par``(``)``.``num_threads``(``0``)``.``sum``(``)``;` `//` shorthand for NumThreads::Auto
`let` `_` `=` `(``0``..``42``)``.``par``(``)``.``chunk_size``(``16``)``.``sum``(``)``;` `//` chunks of exactly 16 elements
`let` c `=` `NonZeroUsize``::`new`(``64``)``.``unwrap``(``)``;`
`let` `_` `=` `(``0``..``42``)``.``par``(``)``.``chunk_size``(``ChunkSize``::`Min`(`c`)``)``.``sum``(``)``;` `//` min 64 elements
`let` `_` `=` `(``0``..``42``)``.``par``(``)``.``chunk_size``(``0``)``.``sum``(``)``;` `//` shorthand for ChunkSize::Auto
`let` `_` `=` `(``0``..``42``)``.``par``(``)``.``num_threads``(``4``)``.``chunk_size``(``16``)``.``sum``(``)``;` `//` set both params

Having control on these two parameters and being able to configure each computation easily and individually is useful in various ways. See EasyConfiguration section for examples.

## Generalization of Sequential and Parallel Computation

Executing a parallel computation with

is equivalent to a sequential computation, without any parallelization overhead. In this sense, `NumThreads ::`Max

`(`

`1`

`)`

`Par`

is a generalization of sequential and parallel computation.In order to illustrate, consider the following function which accepts the definition of a computation as a

. Note that just as sequential iterators, `Par`

is lazy. In other words, it is just the definition of the computation. Such a `Par`

is passed to the `computation`

method together with its settings that can be accessed by `execute`

.`computation .params()`

However, since the method owns the

, it may decide how to execute it. This implementation will go with the given parallel settings. Unless it is Monday, then it will run sequentially.`computation`

`use` `orx_parallel``::``prelude``::``*``;`
`use` `chrono``::``{`Datelike`,` Local`,` Weekday`}``;`
`type` `Output` `=` `String``;`
`fn` `execute``<`C`:` Par`<`Item = Output`>``>``(``computation``:` C`)`` ``->` `Vec``<`Output`>` `{`
`match` `Local``::`now`(``)``.``weekday``(``)` `{`
`Weekday``::`Mon `=>` computation`.``num_threads``(``1``)``.``collect_vec``(``)``,`
`_` `=>` computation`.``collect_vec``(``)``,`
`}`
`}`

This features saves us from defining the same computation twice. We are often required to write code like below where we need to run sequentially or in parallel depending on an input argument. This is repetitive, error-prone and difficult to maintain.

`use` `orx_parallel``::``prelude``::``*``;`
`struct` `Input``(`String`)``;`
`struct` `Output``(``usize``)``;`
`fn` `compute``(``input``:` Input`)`` ``->` Output `{`
Output`(`input`.``0.``len``(``)``)`
`}`
`fn` `select``(``output``:` `&`Output`)`` ``->` `bool` `{`
output`.``0.``is_power_of_two``(``)`
`}`
`fn` `execute_conditionally``(``inputs``:` impl `Iterator``<`Item = Input`>`, `parallelize``:` `bool``)`` ``->` `usize` `{`
`match` parallelize `{`
`true` `=>` inputs
`.``into_iter``(``)`
`.``par``(``)`
`.``map``(`compute`)`
`.``filter``(`select`)`
`.``map``(``|``x``|` `x``.``0``)`
`.``sum``(``)``,`
`false` `=>` inputs
`.``into_iter``(``)`
`.``map``(`compute`)`
`.``filter``(`select`)`
`.``map``(``|``x``|` `x``.``0``)`
`.``sum``(``)``,`
`}`
`}`

Using

, we can have a single version which will not have any overhead when executed sequentially.`Par`

`fn` `execute_unified``(``inputs``:` impl `Iterator``<`Item = Input`>`, `parallelize``:` `bool``)`` ``->` `usize` `{`
`let` num_threads `=` `match` parallelize `{`
`true` `=>` `NumThreads``::`Auto`,`
`false` `=>` `NumThreads``::`sequential`(``)``,`
`}``;`
inputs
`.``par``(``)`
`.``num_threads``(`num_threads`)`
`.``map``(`compute`)`
`.``filter``(`select`)`
`.``map``(``|``x``|` `x``.``0``)`
`.``sum``(``)`
`}`

## Underlying Approach & Performance

This crate has developed as a natural follow up of the

. You may already find example parallel map, fold and find implementations in the examples. Especially when combined with concurrent collections such as `ConcurrentIter`

and `ConcurrentBag`

, implementation of parallel computation has been very straightforward. You may find some details in this section and this discussion.`ConcurrentOrderedBag`

Benchmarks are tricky, even more in parallel context. Nevertheless, results of benchmarks defined in this repository are very promising for

. Its performance is often on-par with rayon. It can provide significant improvements in scenarios where the results are collected, such as map |> filter |> collect or flat_map |> collect, etc.`Par`

## Relation to rayon

See RelationToRayon section for a discussion on orx-parallel's similarities and differences from rayon.

## Contributing

Contributions are welcome! If you notice an error, have a question or think something could be improved, please open an issue or create a PR.

The goal of v1 is to allow

to cover practical use cases, please open an issue if you have a computation that you cannot express and compute with it.`Par`

The goal of v2 is to provide a more dynamic and smart parallel executor, please see and join the related discussion here.

## License

This library is licensed under MIT license. See LICENSE for details.

#### Dependencies

~1MB

~15K SLoC