#trie #prefix #hat #hash

hat_trie

A hat-trie implementation that support prefix match iteration

11 releases

0.2.4 Jun 15, 2020
0.2.3 Jun 14, 2020
0.1.5 Jun 13, 2020

#2053 in Data structures

32 downloads per month

BSD-3-Clause

59KB
950 lines

hat_trie

An implementation of Hat-Trie according to this research by Nikolas Askitis and Ranjan Sinha.

Some part of codes are inspired by hat-trie written in C99. A good introduction to Hat-trie can be found here.

Short description

Hat-trie is a cache conscious kind of trie. This crate implements a "Hybrid" hat-trie. It consumes lesser storage that pure "Hashtable" as many common prefix will be extracted to common node of trie but this happen only if number of entry is large enough thus for small number of entry, it's virtually a "Hashtable" whereas a large number of entry, it's a mixture between "Hashtable" and trie.

How to use

As dictionary:

use htrie::{DenseVecTrieNode, TrieNode};
let mut dvtn = DenseVecTrieNode::new();
assert!(dvtn.get("".as_bytes()).is_none());
assert_eq!(dvtn.put("Yo".as_bytes(), 1), None);
assert_eq!(dvtn.get("Yo".as_bytes()), Some(&1));
assert_eq!(dvtn.put("Yes".as_bytes(), 2), None);
assert_eq!(dvtn.get("Yes".as_bytes()), Some(&2));

As prefix lookup data structure:

use htrie::{DenseVecTrieNode, TrieNode};

let mut dvtn: DenseVecTrieNode<u8, u8> = DenseVecTrieNode::new();
dvtn.put(&[1u8], 1u8);
dvtn.put(&[1u8, 2, 3], 2u8);
dvtn.put(&[1u8, 2, 3, 4], 3u8);

assert_eq!(dvtn.prefix(&[0u8, 1]).map(|(key, _)| {key}).collect::<Vec<&[u8]>>().len(), 0);
let key = &[1u8, 1];
let pf = dvtn.prefix(key).map(|(key, _)| {key}).collect::<Vec<&[u8]>>();
assert_eq!(pf.len(), 1);
assert_eq!(pf[0], &key[..1]);

let key = &[1u8, 2, 3, 5, 6];
let pf = dvtn.prefix(key).map(|(key, _)| {key}).collect::<Vec<&[u8]>>();

assert_eq!(pf.len(), 2);
for k in pf {
    assert_eq!(&key[..k.len()], k);
}

Caveat

Current implementation is closely similar to the one in research paper. In initialization of the DenseVecTrieNode, it pre-allocate memory at least equals to 2^bits of key element times size of usize. So in first example, it is 8 bits thus it pre-allocate memory about 2 ^ 8 * size of usize thus in 64 bits system, it took 1024 bytes + some overhead of ahtable and size of value. However, if your data is u32, the size of Unicode codepoint, you will requires at least 2 ^ 32 * size of usize thus 4,294,967,296 * size of usize. In 64 bits system, it is over 16GB for an empty DenseVecTrieNode.

That is the reasons why in example code above, we encoded data into bytes before putting it in the trie. It can also be possible to encode data into u16 in case your data is encoded in UTF-16. However, for simplicity of example, we use u8. This is because the default encoding that Rust use is UTF-8.

Dependencies

~365KB