2 unstable releases
0.2.0 | Jun 24, 2024 |
---|---|
0.1.0 | Nov 10, 2022 |
#92 in Biology
73 downloads per month
20KB
93 lines
seqdupes
Removes duplicates from FASTA files. Supports filtering based on sequence content or header information.
Installation
Source
Download the source code and run:
cargo install
Usage
Run seqdupes
to process FASTA files. You can specify whether to filter by sequence or by header.
Filtering by Sequence (default)
seqdupes -f path/to/sequence.fastq -j path/to/output.json > no_dupes.fas
Filtering by Header
If you prefer to filter duplicates based on headers rather than sequences, use the --by-header
flag.
seqdupes -f path/to/sequence.fastq -j path/to/output.json --by-header > no_dupes.fas
Arguments
Parameter | Default | Description |
---|---|---|
-f, --fasta | - | The path to the FASTQ file to use. |
-j, --json | - | The output path for listing duplicates. |
-b, --by-header | - | Enables filtering based on headers (optional). |
The tool outputs a FASTA file with duplicates removed to stdout
and a JSON file containing details of the duplicates to the specified path.
Dependencies
~24–38MB
~589K SLoC