2 unstable releases

0.2.0 Jun 24, 2024
0.1.0 Nov 10, 2022

#92 in Biology

Download history 11/week @ 2024-09-11 8/week @ 2024-09-18 12/week @ 2024-09-25 13/week @ 2024-10-02

73 downloads per month

MIT license

20KB
93 lines

seqdupes

Removes duplicates from FASTA files. Supports filtering based on sequence content or header information.

Installation

Source

Download the source code and run:

cargo install

Usage

Run seqdupes to process FASTA files. You can specify whether to filter by sequence or by header.

Filtering by Sequence (default)

seqdupes -f path/to/sequence.fastq -j path/to/output.json > no_dupes.fas

Filtering by Header

If you prefer to filter duplicates based on headers rather than sequences, use the --by-header flag.

seqdupes -f path/to/sequence.fastq -j path/to/output.json --by-header > no_dupes.fas

Arguments

Parameter Default Description
-f, --fasta - The path to the FASTQ file to use.
-j, --json - The output path for listing duplicates.
-b, --by-header - Enables filtering based on headers (optional).

The tool outputs a FASTA file with duplicates removed to stdout and a JSON file containing details of the duplicates to the specified path.

Dependencies

~24–38MB
~589K SLoC