2 releases
0.1.3 | Sep 9, 2024 |
---|---|
0.1.0 | Sep 2, 2024 |
#781 in Parser implementations
57KB
1.5K
SLoC
Clade
A tool for phylogenetic tree construction and pruning based on NCBI taxonomy data and GTDB (Genome Taxonomy Database) data.
Features
- Fetch and process NCBI taxonomy data
- Fetch and process GTDB data
- Parse taxonomy data into efficient vector structures
- Prune phylogenetic trees based on user input
- Generate Newick format output from pruned trees
- Support for both NCBI and GTDB data sources
Installation
Homebrew
brew install eric9n/tap/clade
Usage
The Clade tool supports the following commands:
update
: Update NCBI taxdump filesgtdb
: GTDB related operationslist
: List all GTDB release versionssync
: Download GTDB data files and parse metadatadownload
: Download GTDB data filesparse
: Parse GTDB metadata and create databasenewick
: Generate Newick format from GTDB database
generate
: Generate and print taxonomy summary from taxdump filesprune
: Prune the taxonomy tree and generate Newick format
Examples
-
Update NCBI taxdump files:
clade -t /path/to/taxo update
-
List GTDB release versions:
clade -t /path/to/taxo gtdb list
-
Download and parse GTDB data:
clade -t /path/to/taxo gtdb sync --version 220.0
-
Generate Newick format from GTDB database:
clade -t /path/to/taxo gtdb newick --version 220.0 --domain bacteria --input input.txt --output output.newick
-
Prune taxonomy tree:
clade -t /path/to/taxo prune --taxids 9606,9605 --output pruned.newick
Workflow
- Data Retrieval:
- Fetch the latest taxonomy data from NCBI
- Fetch the latest tree and taxonomy data from GTDB
- Data Processing:
- Decompress the downloaded data
- Parse the taxonomy information into efficient vector structures
- Tree Pruning and Newick Generation:
- Accept user input in the form of taxids or taxonomic names
- Prune the phylogenetic tree to include only the branches related to the input
- Generate a Newick format file representing the pruned phylogenetic tree
Dependencies
~32–46MB
~786K SLoC