2 unstable releases
0.2.0 | Jul 8, 2023 |
---|---|
0.1.0 | Jul 7, 2023 |
#1212 in Text processing
2.5MB
432 lines
BWT construction in small space
This is a Rust implementation of the BWT construction algorithm in small space, described in Algorithm 11.8 of the book: Compact Data Structures - A Practical Approach, Gonzalo Navarro, 2016.
Given a typical text, it runs in $O(n \log n \log \log n)$ time and $O(n)$ additional bits of space, where $n$ is the length of the input string and the alphabet size is much smaller than $n$. See the book for more details.
Documentation
Command line tool
tools
provides a command line tool to construct the BWT of a file.
$ cargo run --release -p tools -- -i input.txt -o output.bwt -t
With a desktop PC (Intel i7, 16 GB), the DNA text of size 385 MiB from Pizza&Chili Corpus was transformed in 6.8 minutes using maximum resident set size of 727 MiB.
Benchmarks
benches
provides benchmarks on the time performance for English texts
extracted from Pizza&Chili Corpus.
$ cargo bench
Licensing
This library is licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
benches/english.10MB.zst
is extracted from Pizza&Chili Corpus and follows LGPL License.