2 releases
Uses new Rust 2024
new 0.1.1 | Apr 21, 2025 |
---|---|
0.1.0 | Apr 21, 2025 |
#349 in Biology
230 downloads per month
16KB
344 lines
seqpls
This is a paired FASTQ grep tool for sequence analysis.
It's built using the same matching algorithm as bqtools grep
, and was developed to demonstrate the performance difference between BINSEQ and FASTQ formats.
It accepts FASTQ files (compressed or uncompressed) and lets you match on regular expressions or fixed strings on either the R1, R2 or both.
Under the hood it uses the paraseq
crate for efficient parallel processing of FASTQ records.
It's fast - but don't use it because using bqtools on binseq files is significantly faster.
Installation
cargo install seqpls
Usage
# See full help menu
seqpls --help
# Search for a fixed string in an unpaired FASTQ file
seqpls -e "ACGT" <some_fastq>
# Search for a string in the R1 and a regex in the R2
seqpls -T3 -e "ACGT" -R "[AC][TG][AC][TG]" <some_r1> <some_r2>
# Filter sequences without string in either using 3 threads
seqpls -T3 -v -F "ACGT" <some_r1> <some_r2>
Contributing
Don't contribute to this project, it's doomed.
Dependencies
~5–12MB
~132K SLoC