26 stable releases (5 major)
8.2.0 | Mar 21, 2024 |
---|---|
7.1.3 | Mar 3, 2024 |
6.3.3 | Feb 2, 2024 |
6.2.0 | Sep 12, 2023 |
3.0.1 | Jun 25, 2023 |
#43 in Science
185KB
4K
SLoC
BLUTILS
The Blutils
tool allow user to run and generate consensus identities of Blast
results. Currently the BlastN is available.
Installation
Blutils
package could be installed directly from
crates.io using cargo:
cargo install blutils-cli
After installed, Blutils
could be evoked using the blu
command.
blu --help
The output should be close to:
A utility to make it easier to run and analyze Blast results
Usage: blu <COMMAND>
Commands:
build-db Build the blast database as a pre-requisite for the blastn command
blastn Execute the parallel blast and run consensus algorithm
check Check `Blutils` dependencies
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help information
-V, --version Print version information
Check dependencies
Optionally you could check OS dependencies before run Blutils
. Naturally BLutils
depends on Ncbi-Blast+ tool to be installed on the host system to perform
parallel blast search. To check if the host OS has these package installed run
the Blutils
checker for linux systems:
blu check linux
Note: Currently the system check is available only for linux systems and assumes that dependencies could be evoked directly from terminal.
Run Blast with Blutils
The Blutils
execution is simple. To check all available options evoke the
blast subcommand help:
blu blastn run-with-consensus --help
After inspect available options, simple run Blutils
with test data. First
download test data from the project github directory:
export INPUT_DIR=https://raw.githubusercontent.com/sgelias/blutils/main/test/mock/input
curl ${INPUT_DIR}/query/query.fna > query.fna
curl ${INPUT_DIR}/query/ref_databases/mock-16S.fna > mock-16S.fna
curl ${INPUT_DIR}/query/ref_databases/mock-16S_taxonomies.tsv > mock-16S_taxonomies.tsv
Then run Blutils
:
blu blastn run-with-consensus \
query.fna \
mock-16S.fna \
mock-16S_taxonomies.tsv \
output \
-t 6 \
--taxon bacteria \
--strategy relaxed \
-f
Seems the above commands, the output files could be found in output directory
which will contains two additional files named blast.out
and
blutils.consensus.json
. The first contains default Blast tabular response and
the former, the Blutils response, which will be close to:
[
{
"query": "NR025123.135626.Bac",
"taxon": {
"rank": "species",
"identifier": "shewanella-olleyana",
"percIdentity": 100.0,
"bitScore": 2695.0,
"alignLength": 1459,
"mismatches": 0,
"gapOpenings": 0,
"qStart": 1,
"qEnd": 1459,
"sStart": 1,
"sEnd": 1459,
"eValue": 0.0,
"taxonomy": "d__bacteria;p__pseudomonadota;c__gammaproteobacteria;o__alteromonadales;f__shewanellaceae;g__shewanella;s__shewanella-olleyana",
"mutated": true,
"consensusBeans": null
}
},
{
"query": "draft-5123",
"taxon": {
"rank": "species",
"identifier": "bacillus-mojavensis-subgroup",
"percIdentity": 100.0,
"bitScore": 815.0,
"alignLength": 441,
"mismatches": 0,
"gapOpenings": 0,
"qStart": 1,
"qEnd": 441,
"sStart": 217,
"sEnd": 657,
"eValue": 0.0,
"taxonomy": "d__bacteria;clade__terrabacteria-group;p__bacillota;c__bacilli;o__bacillales;f__taxid-186817;g__bacillus;species-group__bacillus-subtilis-group;species-subgroup__bacillus-mojavensis-subgroup",
"mutated": true,
"consensusBeans": [
{
"rank": "species",
"identifier": "bacillus-halotolerans",
"occurrences": 5
},
{
"rank": "species",
"identifier": "bacillus-mojavensis",
"occurrences": 4
}
]
}
},
{
"query": "INVALID_SEQUENCE",
"taxon": null
}
]
Blast execution
Blast execution try to reaches the full available CPU saturation. At the default
multithread blast execution mode, the full saturation is not reached. To run
Blast through Blutils
it is possible. All the steps taken during this process
can be seen in the image below.
Consensus generation
Different from consensus generations from QIIME
2, the Blutils
consensus algorithm performs
a data pre-filtering based on Blast results for bit-score and perc-identity,
seems the algorithm described in the image below.
Next steps
This project contains only basic features to run BlastN and generate consensus identities. Thus, many features should be created, like create the database extractor to get data from official NCBI taxdump results and build FASTA database simultaneously, and others. We welcome new feature suggestions when needed!
Dependencies
~24–34MB
~601K SLoC