4 releases
0.4.3 | May 20, 2021 |
---|---|
0.4.2 | Aug 18, 2020 |
0.4.1 | Aug 7, 2020 |
0.4.0 | Aug 7, 2020 |
#6 in #hyper-log-log
120 downloads per month
Used in 10 crates
(4 directly)
170KB
4K
SLoC
amadeus-streaming
SIMD-accelerated implementations of various streaming algorithms.
This is a subcrate of the amadeus
project.
This library is a work in progress. PRs are very welcome! Currently implemented algorithms include:
- Count–min sketch
- Top k (Count–min sketch plus a doubly linked hashmap to track heavy hitters / top k keys when ordered by aggregated value)
- HyperLogLog
- Reservoir sampling
A goal of this library is to enable composition of these algorithms; for example Top k + HyperLogLog to enable an approximate version of something akin to SELECT key FROM table GROUP BY key ORDER BY COUNT(DISTINCT value) DESC LIMIT k
.
Run your application with RUSTFLAGS="-C target-cpu=native"
and the nightly
feature to benefit from the SIMD-acceleration like so:
RUSTFLAGS="-C target-cpu=native" cargo run --features "streaming_algorithms/nightly" --release
See this gist for a good list of further algorithms to be implemented. Other resources are Probabilistic data structures – Wikipedia, DataSketches – A similar Java library originating at Yahoo, and Algebird – A similar Java library originating at Twitter.
As these implementations are often in hot code paths, unsafe is used, albeit only when necessary to a) achieve the asymptotically optimal algorithm or b) mitigate an observed bottleneck.
Dependencies
~1.7–3MB
~59K SLoC