8 releases (4 breaking)
|0.5.0||Apr 25, 2020|
|0.4.0||Jun 30, 2019|
|0.3.1||May 26, 2019|
|0.1.2||Apr 18, 2019|
#339 in Algorithms
260 downloads per month
Used in 6 crates (3 directly)
Suffix array construction and searching algorithms for in-memory binary data.
To index plain texts, burntsushi's suffix featuring utf-8 support is a better choice.
This crate uses the Amos Wenger's C bindings to Yuta Mori's dissufsort to construct suffix array, which is the fastest known suffix array construction algorithm (SACA) running in single thread that uses merely O(1) additional workspace.
Benchmark using Pizza&Chili Corpus.
Rewrite the benchmarks.
Speed up searching by LCP array, a.k.a. construct enhanced suffix array (ESA).There are two major classes of LCP array construction algorithms (LACA): The first class produces LCP array from suffix array. AFAIK, these LACAs have to allocate additional auxiliary arrays (such as ISA, PLCP, and BWT) to construct the LCP array. The second class constructs the suffix array together with its LCP array. They are fast and space-efficient but require sophisticated modifications on kind of SACAs based on induce copying (such as sais-lite, saca-k, and divsufsort). This crate would not provide ESA support until I figure out a proper way to implement it.
Speed up searching by bucket pointers, inspired by this paper (which uses a trie as index, interleaved arrays and text fingerprints to improve locality for ESA searching on disk). See
Add compressed suffix array (CSA) support.CSA safcrificed the speed quite a lot to gain some space efficiency, which is not that necessary.
Serialization/deserialization. Enable the optional
packfeature to use those APIs. This feature is based on Paul Masurel's bitpacking.
Rewrite suffix array construction algorithm. Currently, this crate uses dissufsort to construct all the suffix arrays.