13 releases (7 stable)

2.1.1 Oct 6, 2024
2.1.0 Jan 4, 2024
2.0.1 Jun 26, 2021
1.0.3 Apr 3, 2021
0.2.1 Nov 3, 2019

#78 in Data structures

Download history 10815/week @ 2024-07-27 9347/week @ 2024-08-03 9055/week @ 2024-08-10 10856/week @ 2024-08-17 9709/week @ 2024-08-24 11367/week @ 2024-08-31 10289/week @ 2024-09-07 7175/week @ 2024-09-14 12847/week @ 2024-09-21 13448/week @ 2024-09-28 13564/week @ 2024-10-05 13923/week @ 2024-10-12 14583/week @ 2024-10-19 13297/week @ 2024-10-26 13426/week @ 2024-11-02 8601/week @ 2024-11-09

52,492 downloads per month
Used in 22 crates (4 directly)

MIT license

38KB
538 lines

img

Growable Bloom Filters

CRATES.IO | DOCUMENTATION

Overview

Implementation of Scalable Bloom Filters which also provides serde serialization and deserialize.

A bloom filter lets you insert items, and then test association with contains. It's space and time efficient, at the cost of false positives. In particular, if contains returns true, it may be in filter. But if contains returns false, it's definitely not in the bloom filter.

You can control the failure rate by setting desired_error_prob and est_insertions appropriately.

use growable_bloom_filter::GrowableBloom;

// Create and insert into the bloom filter
let mut gbloom = GrowableBloom::new(0.05, 1000);
gbloom.insert(&0);
assert!(gbloom.contains(&0));

// Serialize and Deserialize the bloom filter
use serde_json;

let s = serde_json::to_string(&gbloom).unwrap();
let des_gbloom: GrowableBloom = serde_json::from_str(&s).unwrap();
assert!(des_gbloom.contains(&0));

// Builder API
use growable_bloom_filter::GrowableBloomBuilder;
let mut gbloom = GrowableBloomBuilder::new()
    .estimated_insertions(100)
    .desired_error_ratio(0.05)
    .build();
gbloom.insert(&0);
assert!(gbloom.contains(&0));

Applications

Bloom filters are typically used as a pre-cache to avoid expensive operations. For example, if you need to ask ten thousand servers if they have data XYZ, you could use GrowableBloom to figure out which ones do NOT have XYZ.

Stability

The (de)serialized bloom filter can be transferred and used across different platforms, independent of endianness, architecture or word size.

Note that stability is only guaranteed within the same major version of the crate.

Upgrading from 1.x to 2.x

  • Any 1.x serialized bloom filters will no longer be loadable in 2.x.
  • Minor API changes otherwise.

Dependencies

~0.5–1MB
~25K SLoC