#stream-processing #future #parallel-processing #async-stream #async #streaming #concurrency

nightly deluge

A highly concurrent stream library driving the underlying futures either concurrently or in parallel to process streaming operations as quickly as possible

13 releases

0.2.1 Nov 15, 2023
0.2.0 Jan 22, 2023
0.1.10 Nov 1, 2022
0.1.9 Oct 27, 2022
0.1.2 Sep 29, 2022

#731 in Asynchronous

MPL-2.0 license

86KB
2K SLoC

Deluge is (not) a Stream

Crate Info API Docs Rustc Version 1.64.0+

Deluge builds on top of Stream to provide stream operations that are parallel or concurrent by default. It allows it's user to have an ordered stream of futures that are evaluated concurrently, with all the complexity hidden inside Deluge itself. We achieve this by working one level higher than a stream. Instead of returning values that might materialize at some point in the future, we immediately return an iterator of unevaluated futures which can then be evaluated by the collector.

The animation below shows an example of mapping over a highly concurrent six element collection. 📘 indicates the time it takes for an underlying element to become available, while 📗 the time it takes to apply a mapped operation.

Example of processing using Deluge and Streams

This library is still experimental, use at your own risk

Design decisions

This is an opinionated library that puts ease of use and external simplicity at the forefront. Operations that apply to individual elements like maps and filters do not allocate. They simply wrap each element in another future but they do not control the way these processed elements are evaluated. It is the collector that controls the evaluation strategy. At the moment there are two basic collectors supplied: a concurrent and a parallel one. Where there is a decision between performance and ease of use to be made, we are likely to fall on the side of ease of use.

The concurrent collector accepts an optional concurrency limit. If it is specified, at most the number of futures equal to that limit will be evaluated at once.

let result = [1, 2, 3, 4]
    .into_deluge()
    .map(|x| async move { x * 2 })
    .collect::<Vec<usize>>(None)
    .await;

assert_eq!(vec![2, 4, 6, 8], result);

The parallel collector spawns a number of workers. If a number of workers is not specified, it will default to the number of logical cpus, if the concurrency limit is not specified each worker will default to total_futures_to_evaluate / number_of_workers. Note that you need to enable either a tokio or async-std feature to support parallel collectors.

let result = (0..150)
    .into_deluge()
    .map(|idx| async move {
        tokio::time::sleep(Duration::from_millis(50)).await;
        idx
    })
    .collect_par::<Vec<usize>>(10, None)
    .await;

assert_eq!(result.len(), 150);

Please take a look at the tests for more examples of using the library.

Converting to a Stream

Both collect and collect_par implement a Stream. If you want to use any of the existing Stream extensions or functions that are missing from Deluge, collect and then use it as any other stream. Everything that is before collection in Deluge gets the benefits of running the futures in parallel.

Questions

I would want to add another operation to DelugeExt. Should I?

By all means. Please do not allocate on the heap in operations that transform individual elements. Any operation you would find useful is fair game, contributions are welcome.

I found a performance improvement, should I submit a PR?

Absolutely! As long the API exposed to the users does not get more complex, the number of allocations does not go up and intermediate memory usage does not increase. Please open an issue first if you feel that breaking any of the above rules is absolutely neccessary and we will discuss.

Dependencies

~3–13MB
~168K SLoC