5 unstable releases
0.3.0 | Oct 1, 2024 |
---|---|
0.2.1 | Sep 23, 2024 |
0.2.0 | Sep 23, 2024 |
0.1.1 | Jul 22, 2024 |
0.1.0 | Dec 14, 2023 |
#80 in Biology
77KB
1.5K
SLoC
NoHuman
👤🧬🚫 Remove human reads from a sequencing run 👤🧬️🚫
nohuman
removes human reads from sequencing reads by classifying them with kraken2 against a custom database
built from all of the genomes in the Human Pangenome Reference Consortium's (
HPRC) first draft human pangenome reference. It can take any type of
sequencing technology. Read more about the development of this method here.
Install
Conda (recommended)
$ conda install -c bioconda nohuman
Precompiled binary
[!IMPORTANT] You will need to install kraken2 yourself using this install method.
curl -sSL nohuman.mbh.sh | sh
# or with wget
wget -nv -O - nohuman.mbh.sh | sh
You can also pass options to the script like so
$ curl -sSL nohuman.mbh.sh | sh -s -- --help
install.sh [option]
Fetch and install the latest version of nohuman, if nohuman is already
installed it will be updated to the latest version.
Options
-V, --verbose
Enable verbose output for the installer
-f, -y, --force, --yes
Skip the confirmation prompt during installation
-p, --platform
Override the platform identified by the installer [default: apple-darwin]
-b, --bin-dir
Override the bin installation directory [default: /usr/local/bin]
-a, --arch
Override the architecture identified by the installer [default: x86_64]
-B, --base-url
Override the base URL used for downloading releases [default: https://github.com/mbhall88/nohuman/releases]
-h, --help
Display this help message
Cargo
[!IMPORTANT] You will need to install kraken2 yourself using this install method.
$ cargo install nohuman
Container
Docker images are hosted on the GitHub Container registry.
apptainer
Prerequisite: apptainer
(previously singularity
)
$ URI="docker://ghcr.io/mbhall88/nohuman:latest"
$ apptainer exec "$URI" nohuman --help
The above will use the latest version. If you want to specify a version then use a tag like so.
$ VERSION="0.2.1"
$ URI="docker://ghcr.io/mbhall88/nohuman:${VERSION}"
docker
Prerequisite: docker
$ docker pull ghcr.io/mbhall88/nohuman:latest
$ docker run ghcr.io/mbhall88/nohuman:latest nohuman --help
You can find all the available tags here.
Build from source
[!IMPORTANT] You will need to install kraken2 yourself using this install method.
$ git clone https://github.com/mbhall88/nohuman.git
$ cd nohuman
$ cargo build --release
$ target/release/nohuman -h
Usage
Download the database
$ nohuman -d
by default, this will place the database in $HOME/.nohuman/db
. If you want to download it somewhere else, use
the --db
option.
Check dependencies are available
$ nohuman -c
[2023-12-14T04:10:46Z INFO ] All dependencies are available
Remove human reads
$ nohuman -t 4 in.fq
this will pass 4 threads to kraken2 and output the clean reads as in.nohuman.fq
.
You can specify where to write the output file with -o
$ nohuman -t 4 -o clean.fq in.fq
If you have paired-end Illumina reads
$ nohuman -t 4 in_1.fq in_2.fq
or to specify a different path for the output
$ nohuman -t 4 --out1 clean_1.fq --out2 clean_2.fq in_1.fq in_2.fq
Set a minimum confidence score for kraken2 classifications
$ nohuman --conf 0.5 in.fq
or write the kraken2 read classification output to a file
$ nohuman -k kraken.out in.fq
[!TIP] Compressed output will be inferred from the specified output path(s). If no output path is provided, the same compression as the input will be used. To override the output compression format, use the
--output-type
option. Supported compression formats are gzip (.gz
), zstandard (zst
), bzip2 (.bz2
), and xz (.xz
). If multiple threads are provided, these will be used for compression of the output (where possible).
Keep human reads
You can invert the functionality of nohuman
to keep only the human reads by using the --human/-H
flag.
$ nohuman -h
Remove human reads from a sequencing run
Usage: nohuman [OPTIONS] [INPUT]...
Arguments:
[INPUT]... Input file(s) to remove human reads from
Options:
-o, --out1 <OUTPUT_1> First output file.
-O, --out2 <OUTPUT_2> Second output file.
-c, --check Check that all required dependencies are available and exit
-d, --download Download the database
-D, --db <PATH> Path to the database [default: /home/michael/.nohuman/db]
-F, --output-type <FORMAT> Output compression format. u: uncompressed; b: Bzip2; g: Gzip; x: Xz (Lzma); z: Zstd
-t, --threads <INT> Number of threads to use in kraken2 and optional output compression. Cannot be 0 [default: 1]
-H, --human Output human reads instead of removing them
-C, --conf <[0, 1]> Kraken2 minimum confidence score [default: 0.0]
-k, --kraken-output <FILE> Write the Kraken2 read classification output to a file
-v, --verbose Set the logging level to verbose
-h, --help Print help (see more with '--help')
-V, --version Print version
Full usage
$ nohuman --help
Remove human reads from a sequencing run
Usage: nohuman [OPTIONS] [INPUT]...
Arguments:
[INPUT]...
Input file(s) to remove human reads from
Options:
-o, --out1 <OUTPUT_1>
First output file.
Defaults to the name of the first input file with the suffix "nohuman" appended.
e.g. "input_1.fastq" -> "input_1.nohuman.fq".
Compression of the output file is determined by the file extension of the output file name.
Or by using the `--output-type` option. If no output path is given, the same compression
as the input file will be used.
-O, --out2 <OUTPUT_2>
Second output file.
Defaults to the name of the first input file with the suffix "nohuman" appended.
e.g. "input_2.fastq" -> "input_2.nohuman.fq".
Compression of the output file is determined by the file extension of the output file name.
Or by using the `--output-type` option. If no output path is given, the same compression
as the input file will be used.
-c, --check
Check that all required dependencies are available and exit
-d, --download
Download the database
-D, --db <PATH>
Path to the database
[default: ~/.nohuman/db]
-F, --output-type <FORMAT>
Output compression format. u: uncompressed; b: Bzip2; g: Gzip; x: Xz (Lzma); z: Zstd
If not provided, the format will be inferred from the given output file name(s), or the
format of the input file(s) if no output file name(s) are given.
-t, --threads <INT>
Number of threads to use in kraken2 and optional output compression. Cannot be 0
[default: 1]
-H, --human
Output human reads instead of removing them
-C, --conf <[0, 1]>
Kraken2 minimum confidence score
[default: 0.0]
-k, --kraken-output <FILE>
Write the Kraken2 read classification output to a file
-v, --verbose
Set the logging level to verbose
-h, --help
Print help (see a summary with '-h')
-V, --version
Print version
Alternates
Hostile is an alignment-based approach that performs well. It take longer and uses
more memory than the nohuman
kraken approach, but has slightly better accuracy for Illumina data. See the paper for
more details and for other alternate approaches.
Cite
Hall, Michael B., and Lachlan J. M. Coin. “Pangenome databases improve host removal and mycobacteria classification from clinical metagenomic data” GigaScience, April 4, 2024. https://doi.org/10.1093/gigascience/giae010
@article{hall_pangenome_2024,
title = {Pangenome databases improve host removal and mycobacteria classification from clinical metagenomic data},
volume = {13},
issn = {2047-217X},
url = {https://doi.org/10.1093/gigascience/giae010},
doi = {10.1093/gigascience/giae010},
urldate = {2024-04-07},
journal = {GigaScience},
author = {Hall, Michael B and Coin, Lachlan J M},
month = jan,
year = {2024},
pages = {giae010},
}
Dependencies
~25–38MB
~557K SLoC