#bloom-filter #filter #set #probabilistic #membership

gauze

Probabilistic set membership filters with a simple interface

5 releases

0.1.4 Jan 7, 2024
0.1.3 Jan 4, 2024
0.1.2 Dec 24, 2023
0.1.1 Dec 23, 2023
0.1.0 Dec 23, 2023

#1065 in Data structures

MIT license

21KB
286 lines

Gauze

A collection of probabilistic set membership filters with a simple interface. These filters can claim that a given entry is

  • definitely not represented in a set of entries, or
  • might be represented in the set.

This crate is a work in progress and currently only implements Bloom filters.

If needed, the extension trait DynFilter enables object-safe, dynamically dispatched variants of insert and contains.

Gauze in Action

A simple Bloom filter implementation looks like this:

use gauze::{BloomFilter, Filter};

fn main() {
    // The number of items we want the `BloomFilter` to store
    // while not returning too many false positives
    let capacity = 100_000;
    // The rate of false positives the `BloomFilter` is allowed
    // to return if it stores no more than `capacity` items
    let target_err_rate = 0.001;
    // These parameters allow us to construct a `BloomFilter` with
    // *approximately* optimal properties
    let mut bloom =
        BloomFilter::new(capacity, target_err_rate)
        .expect("couldn't construct Bloom filter.");

    // `BloomFilter`s can add any type that is `impl Hash`
    bloom.insert(1);
    bloom.insert("a");
    bloom.insert(Vec::<bool>::new());
    bloom.insert([0; 2]);

    // Querying whether a `BloomFilter` contains an element
    // never yields a false negative
    let inserts = capacity - 4;
    for i in 0..inserts {
        bloom.insert(i);
    }

    let mut false_negatives = 0;
    for i in 0..inserts{
        if !bloom.contains(i) {
            false_negatives += 1;
        }
    }
    println!("False negatives: {false_negatives}");

    // But it can yield some false positives
    let mut false_positives = 0;
    for i in 0..inserts{
        if bloom.contains(inserts + i) {
            false_positives += 1;
        }
    }
    println!("False positives: {false_positives}");
    
    // It is possible to get an *approximation* of the number of
    // `item`s stored in the `BloomFilter`
    let stored_items_approx = bloom.count_approx();
    println!("Approximately count of items stored: {stored_items_approx}");
    
    // Items can't be removed. But the `BloomFilter` can be reset.
    bloom.reset();

    // We can also get some properties of the `BloomFilter` itself
    println!("Number of bits for the actual filter: {}", bloom.bit_count());
    println!("Number of hash functions used: {}", bloom.hash_fn_count());
    println!("The filter's actual error rate: {}", bloom.error_rate());
}

Licence

This project is licenced under MIT.

Dependencies

~1.6–2.1MB
~48K SLoC