16 releases (stable)

2.0.0 Apr 9, 2024
1.2.3 Aug 27, 2023
0.2.1 Jul 28, 2023
0.1.1 Jul 24, 2023
0.0.1 Jul 22, 2023

#274 in Data structures


Used in humus

Custom license

90KB
1.5K SLoC

Basic Trie

Test CI

The trie data structure is used for quick access to words and data that should (could) be associated with them.

Basic Trie is implemented as a tree where each node holds a single character that could point at any other character thus allowing insertion of arbitrary words.

There are two major implementations:
  • Trie where words are inserted with nothing attached to them
  • Data Trie where each word has a corresponding vector of data attached to it

Regular tries are often used for word lookups and prefix matching, and data tries are often used for finding all data that is connected to some prefix.

For example, when inserting a whole book in the trie, you could insert every word with the corresponding page number it's on. Later when searching for the word, you could get all the pages the word is on with no added performance cost.

Global features

  • insertion / removal of words
  • fast contains check
  • finding words based on a prefix
  • longest / shortest words in the trie
  • generic methods: is_empty, len, clear
  • Trie equality with ==
  • Trie merging with + or +=

Data Trie features

  • generic type implementation for associating a word to any type, with zero trait constraints
  • finding data of words based on exact match or prefix

Optional features

  • unicode support via the 'unicode' feature with the unicode-segmentation crate (enabled by default)
  • data trie support via the 'data' feature (enabled by default)
  • serialization and deserialization via the 'serde' feature with the serde crate

Dependencies

  • unicode-segmentation (enabled by default)
  • serde (only with 'serde' feature flag)
  • fxhash
  • thin-vec
  • arrayvec

License

The software is licensed under the MIT license.

Examples

 use basic_trie::Trie;

 let mut trie = Trie::new();
 trie.insert("eat");
 trie.insert("eating");
 trie.insert("wizard");

 let mut found_longest_words = trie.get_longest();
 found_longest_words.sort();

 assert!(trie.contains("wizard"));
 assert_eq!(vec![String::from("eating"), String::from("wizard")], found_longest_words);
 assert_eq!(vec![String::from("eat")], trie.get_shortest());
 assert_eq!(3, trie.len());
use basic_trie::DataTrie;

let mut data_trie = DataTrie::<u32>::new();
data_trie.insert("apple", 1);
data_trie.insert("apple", 2);
data_trie.insert_no_data("banana");
data_trie.insert("avocado", 15);

let mut found_data = data_trie.get_data("apple", false).unwrap();
found_data.sort();
assert_eq!(vec![&1, &2], found_data);

let mut found_data = data_trie.get_data("a", true).unwrap();
found_data.sort();
assert_eq!(vec![&1, &2, &15], found_data);

assert_eq!(vec![15], data_trie.remove("avocado").unwrap());

Changelog

  • 2.0.0 - Major redesign: increased memory efficiency for the regular Trie (used to be Dataless Trie); Changed API names to better match the standard library; splitting the two implementations code-wise thus fixing the documentation not rendering bug.
  • 1.2.3 – Adding dependencies for even more memory layout optimisations.
  • 1.2.2 – More memory optimisations with Box.
  • 1.2.1 – Memory performance upgrade with Box. Mutable data retrieval.
  • 1.2.0 – Equality and addition operators support between same Trie types via ==, + and +=.
  • 1.1.1 – Adding FxHashMap dependency for boosted performance.
  • 1.1.0 – Serialization with the serde crate and the 'serde' feature.
  • 1.0.3 – Optimisation of number_of_words(). Removing lifetime requirements for word insertion for much better flexibility at the same logical memory cost.
  • 1.0.2 – Bug fixes.
  • 1.0.1insert_no_data() for DataTrie. Bugfixes.
  • 1.0.0 – Separation of DataTrie and DatalessTrie. Optimizing performance for DatalessTrie. Incompatible with older versions.
  • <1.0.0 – Simple Trie with data and base features.

Dependencies

~0.8–1MB
~15K SLoC