2 unstable releases

0.2.0 Jul 8, 2023
0.1.0 Jul 7, 2023

#987 in Text processing

35 downloads per month

MIT/Apache

2.5MB
432 lines

BWT construction in small space

Documentation Crates.io

This is a Rust implementation of the BWT construction algorithm in small space, described in Algorithm 11.8 of the book: Compact Data Structures - A Practical Approach, Gonzalo Navarro, 2016.

Given a typical text, it runs in $O(n \log n \log \log n)$ time and $O(n)$ additional bits of space, where $n$ is the length of the input string and the alphabet size is much smaller than $n$. See the book for more details.

Documentation

https://docs.rs/small-bwt/

Command line tool

tools provides a command line tool to construct the BWT of a file.

$ cargo run --release -p tools -- -i input.txt -o output.bwt -t

With a desktop PC (Intel i7, 16 GB), the DNA text of size 385 MiB from Pizza&Chili Corpus was transformed in 6.8 minutes using maximum resident set size of 727 MiB.

Benchmarks

benches provides benchmarks on the time performance for English texts extracted from Pizza&Chili Corpus.

$ cargo bench

Licensing

This library is licensed under either of

at your option.

benches/english.10MB.zst is extracted from Pizza&Chili Corpus and follows LGPL License.

Dependencies