5 stable releases
1.2.1 | Jun 26, 2024 |
---|---|
1.1.0 | Jun 17, 2024 |
1.0.1 | Jun 11, 2024 |
#397 in Data structures
Used in isonclust3
63KB
1.5K
SLoC
minimizer-iter
Iterate over minimizers of a DNA sequence.
Features
- iterates over minimizers in a single pass
- yields bitpacked minimizers with their position
- supports mod-minimizers, introduced by Groot Koerkamp & Pibiri
- supports canonical minimizers
- supports custom bit encoding of the nucleotides
- supports custom hasher, using wyhash by default
- can be seeded to produce a different ordering
If you'd like to use the underlying data structure manually, have a look at the minimizer-queue crate.
Example usage
use minimizer_iter::MinimizerBuilder;
// Build an iterator over minimizers
// of size 21 with a window of size 11
// for the sequence "TGATTGCACAATC"
let min_iter = MinimizerBuilder::<u64>::new()
.minimizer_size(21)
.width(11)
.iter(b"TGATTGCACAATC");
for (minimizer, position) in min_iter {
// ...
}
If you'd like to use mod-minimizers instead, just change new()
to new_mod()
:
use minimizer_iter::MinimizerBuilder;
// Build an iterator over mod-minimizers
// of size 21 with a window of size 11
// for the sequence "TGATTGCACAATC"
let min_iter = MinimizerBuilder::<u64, _>::new_mod()
.minimizer_size(21)
.width(11)
.iter(b"TGATTGCACAATC");
for (minimizer, position) in min_iter {
// ...
}
Additionally, the iterator can produce canonical minimizers so that a sequence and its reverse complement will select the same minimizers.
To do so, just add .canonical()
to the builder:
MinimizerBuilder::<u64>::new()
.canonical()
.minimizer_size(...)
.width(...)
.iter(...)
If you need longer minimizers (> 32 bases), you can specify a bigger integer type such as u128
:
MinimizerBuilder::<u128>::new()
.minimizer_size(...)
.width(...)
.iter(...)
See the documentation for more details.
Benchmarks
To run benchmarks against other implementations of minimizers, clone this repository and run:
cargo bench
Contributors
- Igor Martayan (main developer)
Dependencies
~250KB