#bloom-filter #split #block #parquet

sbbf-rs-safe

Split block bloom filter implementation

10 unstable releases (3 breaking)

0.3.2 Jul 1, 2023
0.3.1 Jun 30, 2023
0.2.2 Jun 28, 2023
0.1.2 Jun 21, 2023
0.0.4 May 23, 2023

#1490 in Data structures


Used in 3 crates (via hypersync-format)

MIT license

8KB
91 lines

sbbf-rs-safe

CI Crates.io version

What is this?

This is a split block bloom filter based on sbbf-rs. This is an exact implementation of parquet bloom filter spec.

Storing to permanent storage

The Filter::as_bytes, Filter::from_bytes methods can be used to save/restore the filter to/from permanent storage.

Why use this instead of any other bloom filter implementation on crates.io?

Split block bloom filters have very good performance because; they only load a small amount of data per query/insert, they don't include any branching in their code, they can be accelerated using SIMD instructions.

This particular implementation produces same byte buffers on any system, so it can be used to implement persistent bloom filters that are stored on disk or transferred over the internet.

Although this library is lacking features like removal, counting etc., at the time of writing it seems to be the fastest bloom filter implementation in rust. Benchmarks can be run with cargo bench on this repo.

Notes on WASM

This library doesn't require nightly except if built using wasm target and simd128 cpu feature enabled. It requires nightly compiler only if the target is wasm and the simd128 cpu feature is enabled.

Dependencies

~55KB