#bloom-filter #collection #probabilistic #approximate #approximation #count-min-sketch

probabilistic-collections

Various implementations of collections that use approximations to improve on running time or memory, but introduce a certain amount of error

7 releases (breaking)

0.7.0 May 11, 2020
0.6.0 Apr 24, 2020
0.5.0 Nov 4, 2018
0.4.0 Sep 9, 2018
0.1.0 Sep 6, 2018

#534 in Data structures

Download history 1423/week @ 2023-11-27 1025/week @ 2023-12-04 1638/week @ 2023-12-11 1316/week @ 2023-12-18 935/week @ 2023-12-25 1525/week @ 2024-01-01 2013/week @ 2024-01-08 3002/week @ 2024-01-15 4209/week @ 2024-01-22 2098/week @ 2024-01-29 1912/week @ 2024-02-05 1773/week @ 2024-02-12 1170/week @ 2024-02-19 1960/week @ 2024-02-26 1854/week @ 2024-03-04 880/week @ 2024-03-11

5,896 downloads per month
Used in 9 crates (5 directly)

MIT/Apache

270KB
4.5K SLoC

probabilistic-collections-rs

Documentation License: MIT License: Apache 2.0 Pipeline Status Coverage Report

probabilistic-collections contains various implementations of collections that use approximations to improve on running time or memory, but introduce a certain amount of error. The error can be controlled under a certain threshold which makes these data structures extremely useful for big data and streaming applications.

The following types of collections are implemented:

  • Approximate Membership in Set: BloomFilter, PartitionedBloomFilter, CuckooFilter, QuotientFilter
  • Scalable Approximate Membership in Set: ScalableBloomFilter, ScalableCuckooFilter
  • Approximate Membership in Stream: BSBloomFilter, BSSDBloomFilter, RLBSBloomFilter
  • Approximate Item Count: CountMinSketch
  • Approximate Distinct Item Count: HyperLogLog
  • Set similarity: MinHash, SimHash

Usage

Add this to your Cargo.toml:

[dependencies]
probabilistic-collections = "*"

For serde support, include the serde feature:

[dependencies]
probabilistic-collections = { version = "*", features = ["serde"] }

Add this to your crate root if you are using Rust 2015:

extern crate probabilistic_collections;

Caveats

If you are using this crate to create collections to be used across different platforms, you must be careful not to use keys of type [T]. The Rust standard library implementation of Hash for [T] first hashes the length of the slice, then the contents of the slice. The length is platform specific because it is of type usize. Therefore, a collection with keys of type [T] will have unexpected results when used across platforms. For example, if you were to generate and serilize a BloomFilter on i686 compiled code where the keys are of type [u8], the filter will not have the correct results when deserialized on x86_64 compiled code.

The recommended work-around to this problem is to a define a wrapper struct around a [T] with a Hash implementation that only hashes the contents of the slice.

Changelog

See CHANGELOG for more details.

References

License

probabilistic-collections-rs is dual-licensed under the terms of either the MIT License or the Apache License (Version 2.0).

See LICENSE-APACHE and LICENSE-MIT for more details.

Dependencies

~1–1.3MB
~27K SLoC