#trie #double-array #search #trie-node #binary-representation #multiple-values #heap-allocation

yada

Yada is a yet another double-array trie library aiming for fast search and compact data representation

9 unstable releases

0.5.1 Feb 25, 2024
0.5.0 Nov 1, 2021
0.4.1 Nov 1, 2021
0.4.0 Oct 10, 2020
0.1.0 Sep 20, 2020

#49 in Data structures

Download history 109211/week @ 2024-07-20 127430/week @ 2024-07-27 218180/week @ 2024-08-03 319448/week @ 2024-08-10 275302/week @ 2024-08-17 558352/week @ 2024-08-24 215959/week @ 2024-08-31 404267/week @ 2024-09-07 188692/week @ 2024-09-14 289422/week @ 2024-09-21 409311/week @ 2024-09-28 376096/week @ 2024-10-05 646845/week @ 2024-10-12 91750/week @ 2024-10-19 77611/week @ 2024-10-26 131939/week @ 2024-11-02

1,021,069 downloads per month
Used in 47 crates (10 directly)

MIT/Apache

36KB
768 lines

Yada: Yet Another Double-Array

crate-name at crates.io crate-name at docs.rs

Yada is a yet another double-array trie library aiming for fast search and compact data representation.

Features

  • Build static double-array tries
    • Yada adopts the compact binary representation of double-array nodes like Darts-clone.
  • Common prefix search
    • The method returns an Iterator that is an effective way to find multiple values without heap allocation.
  • Exact match search
    • The method finds a value associated with an exact match key as a Option.

Requirements

  • Rust version >= 1.46.0

Usage

See also example code for more details.

Build a double-array trie

use yada::builder::DoubleArrayBuilder;

// make a keyset which have key-value pairs
let keyset = &[
    ("a".as_bytes(), 0),
    ("ab".as_bytes(), 1),
    ("abc".as_bytes(), 2),
    ("b".as_bytes(), 3),
    ("bc".as_bytes(), 4),
    ("c".as_bytes(), 5),
];

// build a double-array trie binary
let da_bytes: Option<Vec<u8>> = DoubleArrayBuilder::build(keyset);

Search entries by keys

use yada::DoubleArray;

// create a double-array trie instance
let da = DoubleArray::new(da_bytes.unwrap());

// exact match search
for (key, value) in keyset {
    assert_eq!(da.exact_match_search(key), Some(*value as u32));
}
assert_eq!(da.exact_match_search("abc".as_bytes()), Some(2));
assert_eq!(da.exact_match_search("abcd".as_bytes()), None);

// common prefix search
assert_eq!(
    da.common_prefix_search("abcd".as_bytes())
        .collect::<Vec<_>>(),
    vec![(0, 1), (1, 2), (2, 3)] // match "a", "ab", "abc", value and key length
);
assert_eq!(
    da.common_prefix_search("d".as_bytes()).collect::<Vec<_>>(),
    vec![] // don't match
);

Limitations

  • The value must be represented as a 31 bit unsigned integer, typed u32.
    • Yada uses the most significant bit (MSB) as a flag to distinguish between a value node and others.
  • The offset of an double-array node is 29 bits wide, so it can represent up to ~536M nodes.
    • It means this limitation results in the size upper bound ~2GB of double-arrays.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

References

No runtime deps