#genomics-sequencing #genomics #grep #sequencing #bioinformatics

app seqpls

My sequences please - a paired fastq grepper with regex support

2 releases

Uses new Rust 2024

new 0.1.1 Apr 21, 2025
0.1.0 Apr 21, 2025

#349 in Biology

Download history 226/week @ 2025-04-15

230 downloads per month

MIT license

16KB
344 lines

seqpls

MIT licensed Crates.io

This is a paired FASTQ grep tool for sequence analysis.

It's built using the same matching algorithm as bqtools grep, and was developed to demonstrate the performance difference between BINSEQ and FASTQ formats.

It accepts FASTQ files (compressed or uncompressed) and lets you match on regular expressions or fixed strings on either the R1, R2 or both.

Under the hood it uses the paraseq crate for efficient parallel processing of FASTQ records.

It's fast - but don't use it because using bqtools on binseq files is significantly faster.

Installation

cargo install seqpls

Usage

# See full help menu
seqpls --help

# Search for a fixed string in an unpaired FASTQ file
seqpls -e "ACGT" <some_fastq>

# Search for a string in the R1 and a regex in the R2
seqpls -T3 -e "ACGT" -R "[AC][TG][AC][TG]" <some_r1> <some_r2>

# Filter sequences without string in either using 3 threads
seqpls -T3 -v -F "ACGT" <some_r1> <some_r2>

Contributing

Don't contribute to this project, it's doomed.

Dependencies

~5–12MB
~132K SLoC