9 unstable releases (4 breaking)
Uses old Rust 2015
0.8.3 | Feb 22, 2024 |
---|---|
0.8.1 | Feb 12, 2024 |
0.7.2 | Oct 28, 2023 |
0.4.4 | Mar 17, 2022 |
0.1.12 | Apr 30, 2018 |
#22 in Biology
119 downloads per month
165KB
2.5K
SLoC
Fasten
A powerful manipulation suite for interleaved fastq files.
Executables can read/write to stdin
and stdout
, and they are compatible with the interleaved fastq format.
This makes it much easier to perform streaming operations using unix pipes.
Synopsis
read metrics
$ cat testdata/R1.fastq testdata/R2.fastq | \
fasten_shuffle | fasten_metrics | column -t
totalLength numReads avgReadLength avgQual
800 8 100 19.53875
read cleaning
$ cat testdata/R1.fastq testdata/R2.fastq | \
fasten_shuffle | \
fasten_clean --paired-end --min-length 2 | \
gzip -c > cleaned.shuffled.fastq.gz
$ zcat cleaned.shuffled.fastq.gz | fasten_metrics | column -t
totalLength numReads avgReadLength avgQual
800 8 100 19.53875
# No reads were actually filtered with cleaning, with --min-length=2
Installation
Installation from source
Fasten is programmed in the Rust programming language. More information about Rust, including installation and the executable cargo
, can be found at rust-lang.org.
After downloading, use the Rust executable cargo
like so:
cd fasten
cargo build --release
export PATH=$PATH:$(pwd)/target/release
All executables will be in the directory fasten/target/release
.
note: there are some Makefile
methods to help including
make all
to make the followingmake release
install fast executablesmake debug
install executables quickly (although the executables will not be optimized)make fasten/doc
compile lastest documents
make clean
uninstall local binaries
Installation without git
You can also install Fasten straight from https://crates.io using the following command:
cargo install fasten
Detailed information on how this works can be found in the cargo handbook at https://doc.rust-lang.org/cargo/commands/cargo-install.html.
General usage
All scripts accept the parameters, read uncompressed fastq format from stdin, and print uncompressed fastq format to stdout. All paired end fastq files must be in interleaved format, and they are written in interleaved format, except when deshuffling with fasten_shuffle
.
--help
--numcpus
Not all scripts will take advantage of numcpus. (not currently implemented)--paired-end
Input reads are interleaved paired end--verbose
Print more status messages
Documentation
Please see the inline documentation at https://lskatz.github.io/fasten/fasten
This documentation was built with cargo doc --no-deps
Other documentation
- Some wrapper scripts are noted in the scripts page.
Contributing
Instructions for how to contribute can be found in CONTRIBUTING.md.
Fasten script descriptions
All executables read and write in the fastq format
except fasten_convert
.
executable | Description |
---|---|
fasten_clean |
Trims and cleans a fastq file. |
fasten_convert |
Converts between different sequence formats like fastq, sam, fasta. |
fasten_straighten |
Convert any fastq file to a standard four-line-per-entry format. |
fasten_metrics |
Prints basic read metrics. |
fasten_pe |
Determines paired-endedness based on read IDs. |
fasten_randomize |
Randomizes reads from input |
fasten_combine |
Combines identical reads and updates quality scores. |
fasten_kmer |
Kmer counting. |
fasten_normalize |
Normalize read depth by using kmer counting. |
fasten_sample |
Downsamples reads. |
fasten_shuffle |
Shuffles or deshuffles paired end reads. |
fasten_validate |
Validates your reads (deprecated in favor of fasten_inspect and fasten_repair |
fasten_inspect |
adds information to read IDs such as seqlength |
fasten_repair |
Repairs corrupted reads |
fasten_quality_filter |
Transforms nucleotides to "N" if the quality is low |
fasten_trim |
Blunt-end trims reads |
fasten_replace |
Find and replace using regex |
fasten_mutate |
introduce random mutations |
fasten_regex |
Filter for reads using regex |
fasten_progress |
Add progress to any place in the pipeline |
fasten_sort |
Sort fastq entries |
Etymology
Many of these scripts have inspiration from the fastx toolkit, and I wanted to make a fasty
which was already the name of a bioinformatics program.
Therefore I cycled through other letters of the alphabet and came across "N." So it is possible to pronounce this project like "Fast-N" or in a way
that indicates that you are securing your analysis by "fasten"ing it (with a silent T).
Citation
To cite, please refer to Katz et al., (2024). Fasten: a toolkit for streaming operations on fastq files. Journal of Open Source Software, 9(94), 6030, https://doi.org/10.21105/joss.06030
Dependencies
~11MB
~189K SLoC