6 releases (3 breaking)
0.4.0 | Sep 2, 2022 |
---|---|
0.3.0 | Aug 1, 2022 |
0.2.1 | Jun 4, 2022 |
0.2.0 | Apr 23, 2022 |
0.1.1 | Apr 13, 2022 |
#1299 in Text processing
1,314 downloads per month
Used in 2 crates
(via vibrato)
59KB
1.5K
SLoC
🦞 Crawdad: ChaRActer-Wise Double-Array Dictionary
Overview
Crawdad is a library of natural language dictionaries using character-wise double-array tries. The implementation is optimized for strings of multibyte-characters, and you can enjoy fast text processing on strings such as Japanese or Chinese.
For example, on a large Japanese dictionary of IPADIC+Neologd, Crawdad has a better time-space tradeoff than other Rust libraries.
The detailed experimental settings and other results are available on Wiki.
What can do
- Key-value mapping: Crawdad stores a set of string keys with mapping arbitrary integer values.
- Exact match: Crawdad supports a fast lookup for an input key.
- Common prefix search: Crawdad supports fast common prefix search that can be used to enumerate all keys appearing in a text.
Data structures
Crawdad contains the two trie implementations:
crawdad::Trie
is a standard trie form that often provides the fastest queries.crawdad::MpTrie
is a minimal-prefix trie form that is memory-efficient for long strings.
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
For softwares under bench/data
, follow the license terms of each software.
Acknowledgment
The initial version of this software was developed by LegalForce, Inc., but not an officially supported LegalForce product.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.