4 releases (2 breaking)
0.3.1 | Dec 3, 2020 |
---|---|
0.3.0 | Dec 3, 2020 |
0.2.0 | Nov 25, 2020 |
0.0.1 | Oct 31, 2020 |
#1243 in Algorithms
25 downloads per month
Used in dsrs
200KB
6K
SLoC
Approximate computing
Approximate computing "is a computation technique which returns a possibly inaccurate result rather than a guaranteed accurate result," which "can provide disproportionate gains in performance and energy, while still achieving acceptable result accuracy."
This library is a collection of several approximate computing algorithms written in Rust. The goals are:
- Ease of use. For maximum adoption, it should be straightforward to make use of AC algorithms with very little effort.
- Parallelization. When possible, algorithms should be capable of being parallelized (ie, map-reduce of the core algorithm). This includes serialization of datasets.
- Common dependencies. Minimize the dependency chain to reduce build times and binary sizes.
NOTE that I did not write any of these algorithms - they are all implemented by other talented Rustaceans, the repositories for which are linked below. I have, however, added some functionality (serialization/deserialization, exposing new initialization functions, etc.), and I did implement the Rust version from C#. All source code is MIT-licensed.
Using
To use probably
in your Cargo.toml:
probably = "0.2.0"
To include serialization of data structures, include the with_serde
feature:
probably = { version = "0.2.0", features = ["with_serde"] }
The algorithms implemented in this library and on the roadmap fall into one of several categories:
Cardinality/frequency
These algorithms approxmate counting the number of items in a set within some approximate acceptable error.
- HyperLogLog (HLL) -- implemented from
rust-hyperloglog
- Count-min sketch -- implemented from
rust-count-min-sketch
to serve as a frequency table of events
Membership
These algorithms determine if an item n exists in the set N, guaranteeing no false negatives at the cost of some false positives.
- Bloom Filter -- implemented from
bloom_filter
- Quotient filter -- like Bloom filter, but can be merged and resized more efficiently. Uses 10-25% more space than BF.
- Paper on MQF, "a compact hashtable, can efficiently store k-mers with a skewed distribution"
- MQF implemented in C++
- Cuckoo filter -- implemented from
rust-cuckoofilter
. like Bloom filter, but can delete items. Can use lower space overhead than BF.
Quantile Estimators
- Piecewise-Parabolic Quantile Estimator -- translated from
perfolizer
C# - Greenwald-Khanna -- implemented from
postmates/quantiles
- Misra-Gries -- also from
postmates/quantiles
- Histogram -- also from
postmates/quantiles
Other
- MinHash for estimating similarity of two sets
Additional reading
Dependencies
~0.5–2MB
~33K SLoC