1 unstable release
0.4.0 | Mar 24, 2021 |
---|
#2295 in Algorithms
37KB
829 lines
Yada: Yet Another Double-Array
Yada is a yet another double-array trie library aiming for fast search and compact data representation.
Features
- Build static double-array tries
- Yada adopts the compact binary representation of double-array nodes like Darts-clone.
- Common prefix search
- The method returns an
Iterator
that is an effective way to find multiple values without heap allocation.
- The method returns an
- Exact match search
- The method finds a value associated with an exact match key as a
Option
.
- The method finds a value associated with an exact match key as a
Requirements
- Rust version >= 1.46.0
Usage
See also example code for more details.
Build a double-array trie
use yada::builder::DoubleArrayBuilder;
// make a keyset which have key-value pairs
let keyset = &[
("a".as_bytes(), 0),
("ab".as_bytes(), 1),
("abc".as_bytes(), 2),
("b".as_bytes(), 3),
("bc".as_bytes(), 4),
("c".as_bytes(), 5),
];
// build a double-array trie binary
let da_bytes: Option<Vec<u8>> = DoubleArrayBuilder::build(keyset);
Search entries by keys
use yada::DoubleArray;
// create a double-array trie instance
let da = DoubleArray::new(da_bytes.unwrap());
// exact match search
for (key, value) in keyset {
assert_eq!(da.exact_match_search(key), Some(*value as u32));
}
assert_eq!(da.exact_match_search("abc".as_bytes()), Some(2));
assert_eq!(da.exact_match_search("abcd".as_bytes()), None);
// common prefix search
assert_eq!(
da.common_prefix_search("abcd".as_bytes())
.collect::<Vec<_>>(),
vec![(0, 1), (1, 2), (2, 3)] // match "a", "ab", "abc", value and key length
);
assert_eq!(
da.common_prefix_search("d".as_bytes()).collect::<Vec<_>>(),
vec![] // don't match
);
Limitations
- The value must be represented as a 31 bit unsigned integer, typed
u32
.- Yada uses the most significant bit (MSB) as a flag to distinguish between a value node and others.
- The offset of an double-array node is 29 bits wide, so it can represent up to
~536M nodes.
- It means this limitation results in the size upper bound ~2GB of double-arrays.
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.