#approximate #quantile #frequency #membership

no-std probably

A library with various approximate computing algorithms

4 releases (2 breaking)

0.3.1 Dec 3, 2020
0.3.0 Dec 3, 2020
0.2.0 Nov 25, 2020
0.0.1 Oct 31, 2020

#1060 in Algorithms


Used in dsrs

Custom license

200KB
6K SLoC

Approximate computing

Approximate computing "is a computation technique which returns a possibly inaccurate result rather than a guaranteed accurate result," which "can provide disproportionate gains in performance and energy, while still achieving acceptable result accuracy."

This library is a collection of several approximate computing algorithms written in Rust. The goals are:

  • Ease of use. For maximum adoption, it should be straightforward to make use of AC algorithms with very little effort.
  • Parallelization. When possible, algorithms should be capable of being parallelized (ie, map-reduce of the core algorithm). This includes serialization of datasets.
  • Common dependencies. Minimize the dependency chain to reduce build times and binary sizes.

NOTE that I did not write any of these algorithms - they are all implemented by other talented Rustaceans, the repositories for which are linked below. I have, however, added some functionality (serialization/deserialization, exposing new initialization functions, etc.), and I did implement the Rust version from C#. All source code is MIT-licensed.

Using

To use probably in your Cargo.toml:

probably = "0.2.0"

To include serialization of data structures, include the with_serde feature:

probably = { version = "0.2.0", features = ["with_serde"] }

The algorithms implemented in this library and on the roadmap fall into one of several categories:

Cardinality/frequency

These algorithms approxmate counting the number of items in a set within some approximate acceptable error.

Membership

These algorithms determine if an item n exists in the set N, guaranteeing no false negatives at the cost of some false positives.

  • Bloom Filter -- implemented from bloom_filter
  • Quotient filter -- like Bloom filter, but can be merged and resized more efficiently. Uses 10-25% more space than BF.
  • Cuckoo filter -- implemented from rust-cuckoofilter. like Bloom filter, but can delete items. Can use lower space overhead than BF.

Quantile Estimators

Other

  • MinHash for estimating similarity of two sets

Additional reading

Dependencies

~0.6–2MB
~33K SLoC