#structure #index #in-memory #hash #filter #bigsi-like

bigsi_rs

A in-memory implementation of a BIGSI-like data structure

2 releases

0.1.1 Mar 30, 2023
0.1.0 Feb 15, 2020

#2006 in Data structures

23 downloads per month

MIT license

12KB
173 lines

bigsi_rs

Rust in-memory implementation of a BIGSI-like data structure

Rust in-memory implementation of a BIGSI-like data structure (see https://www.nature.com/articles/s41587-018-0010-1). Comparable to a bloom filter; where a bloom filter tells if an element belongs to a single previously indexed set, a BIGSI-like data structure efficiently tells if an elements is a member of multiple query sets. Parameters (in particular the index size and the number of hashes) should be chosen to assure a (down-stream) application permissable false positive probabilty. My strategy is to base this on the largest set to be indexed and use the formula of Goel and Gupta 2010 (https://web.stanford.edu/~ashishg/papers/inverted.pdf) to calculate the false postive probability for this set, and hence the maximum false positive probability for the index.

Example

use bigsi_rs;


//create a new index of size 250,000, 10 accessions and 3 hash functions
let mut new_filter = bigsi_rs::Bigsi::new(250000, 10, 3);

//insert words in index for accessions 0, 3 and 7        
new_filter.insert(0, "ATGT");
new_filter.insert(3, "ATGT");
new_filter.insert(7, "ATGT");

//shrink uninformative elements in index
new_filter.slim();

assert_eq!(new_filter.get("ATGT").len(), 3 as usize);
assert_eq!(new_filter.get("ATGC").len(), 0 as usize);

Dependencies

~1.3–2.2MB
~47K SLoC