Lib.rs

›

#membership #quantile #approximate #frequency #algorithm

no-std probably

A library with various approximate computing algorithms

4 releases (2 breaking)

0.3.1	Dec 3, 2020
0.3.0	Dec 3, 2020
0.2.0	Nov 25, 2020
0.0.1	Oct 31, 2020

#1305 in Algorithms

25 downloads per month
Used in dsrs

Custom license

200KB
6K SLoC

Approximate computing

Approximate computing "is a computation technique which returns a possibly inaccurate result rather than a guaranteed accurate result," which "can provide disproportionate gains in performance and energy, while still achieving acceptable result accuracy."

This library is a collection of several approximate computing algorithms written in Rust. The goals are:

Ease of use. For maximum adoption, it should be straightforward to make use of AC algorithms with very little effort.
Parallelization. When possible, algorithms should be capable of being parallelized (ie, map-reduce of the core algorithm). This includes serialization of datasets.
Common dependencies. Minimize the dependency chain to reduce build times and binary sizes.

NOTE that I did not write any of these algorithms - they are all implemented by other talented Rustaceans, the repositories for which are linked below. I have, however, added some functionality (serialization/deserialization, exposing new initialization functions, etc.), and I did implement the Rust version from C#. All source code is MIT-licensed.

Using

To use probably in your Cargo.toml:

probably = "0.2.0"

To include serialization of data structures, include the with_serde feature:

probably = { version = "0.2.0", features = ["with_serde"] }

The algorithms implemented in this library and on the roadmap fall into one of several categories:

Cardinality/frequency

These algorithms approxmate counting the number of items in a set within some approximate acceptable error.

HyperLogLog (HLL) -- implemented from rust-hyperloglog
Count-min sketch -- implemented from rust-count-min-sketch to serve as a frequency table of events

Membership

These algorithms determine if an item n exists in the set N, guaranteeing no false negatives at the cost of some false positives.

Bloom Filter -- implemented from bloom_filter
Quotient filter -- like Bloom filter, but can be merged and resized more efficiently. Uses 10-25% more space than BF.
- Paper on MQF, "a compact hashtable, can efficiently store k-mers with a skewed distribution"
- MQF implemented in C++
Cuckoo filter -- implemented from rust-cuckoofilter. like Bloom filter, but can delete items. Can use lower space overhead than BF.

Quantile Estimators

Piecewise-Parabolic Quantile Estimator -- translated from perfolizer C#
- Other perfolizer estimators
Greenwald-Khanna -- implemented from postmates/quantiles
Misra-Gries -- also from postmates/quantiles
Histogram -- also from postmates/quantiles

Other

MinHash for estimating similarity of two sets

Additional reading

Probabilistic Data Structures

Dependencies

~0.5–2MB
~32K SLoC