#bioinformatics #metagenomics #contamination

bin+lib nohuman

Remove human reads from a sequencing run

1 unstable release

0.1.0 Dec 14, 2023

#148 in Biology

Custom license

32KB
445 lines

NoHuman

Rust CI Crates.io License: MIT github release version DOI:10.1101/2023.09.18.558339

👤➡️🚫 Remove human reads from a sequencing run 👤➡️🚫

nohuman removes human reads from sequencing reads by classifying them with kraken2 against a custom database built from all of the genomes in the Human Pangenome Reference Consortium's (HPRC) first draft human pangenome reference. It can take any type of sequencing technology. Read more about the development of this method here.

Install

Conda (channel only) bioconda version Conda

$ conda install -c bioconda nohuman

Precompiled binary

Note: you will need to install kraken2 yourself using this install method.

curl -sSL nohuman.mbh.sh | sh
# or with wget
wget -nv -O - nohuman.mbh.sh | sh

You can also pass options to the script like so

$ curl -sSL nohuman.mbh.sh | sh -s -- --help
install.sh [option]

Fetch and install the latest version of nohuman, if nohuman is already
installed it will be updated to the latest version.

Options
        -V, --verbose
                Enable verbose output for the installer

        -f, -y, --force, --yes
                Skip the confirmation prompt during installation

        -p, --platform
                Override the platform identified by the installer [default: apple-darwin]

        -b, --bin-dir
                Override the bin installation directory [default: /usr/local/bin]

        -a, --arch
                Override the architecture identified by the installer [default: x86_64]

        -B, --base-url
                Override the base URL used for downloading releases [default: https://github.com/mbhall88/nohuman/releases]

        -h, --help
                Display this help message

Cargo

Crates.io

Note: you will need to install kraken2 yourself using this install method.

$ cargo install nohuman

Container

Docker images are hosted at quay.io.

singularity

Prerequisite: singularity

$ URI="docker://quay.io/mbhall88/nohuman"
$ singularity exec "$URI" nohuman --help

The above will use the latest version. If you want to specify a version then use a tag (or commit) like so.

$ VERSION="0.1.0"
$ URI="docker://quay.io/mbhall88/nohuman:${VERSION}"

docker

Docker Repository on Quay

Prerequisite: docker

$ docker pull quay.io/mbhall88/nohuman
$ docker run quay.io/mbhall88/nohuman nohuman --help

You can find all the available tags on the quay.io repository.

Build from source

Note: you will need to install kraken2 yourself using this install method.

$ git clone https://github.com/mbhall88/nohuman.git
$ cd nohuman
$ cargo build --release
$ target/release/nohuman -h

Usage

Download the database

$ nohuman -d

by default, this will place the database in $HOME/.nohuman/db. If you want to download it somewhere else, use the --db option.

Check dependecies are available

$ nohuman -c
[2023-12-14T04:10:46Z INFO ] All dependencies are available

Remove human reads

$ nohuman -t 4 in.fq

this will pass 4 threads to kraken2 and output the clean reads as in.nohuman.fq.

You can specify where to write the output file with -o

$ nohuman -t 4 -o clean.fq in.fq

If you have paired-end Illumina reads

$ nohuman -t 4 in_1.fq in_2.fq

or to specify a different path for the output

$ nohuman -t 4 --out1 clean_1.fq --out2 clean_2.fq in_1.fq in_2.fq

Note: output will always be uncompressed, even if compressed input is provided.

$ nohuman -h
Remove human reads from a sequencing run

Usage: nohuman [OPTIONS] [INPUT]...

Arguments:
  [INPUT]...  Input file(s) to remove human reads from

Options:
  -o, --out1 <OUTPUT_1>  First output file
  -O, --out2 <OUTPUT_2>  Second output file - if two input files given
  -c, --check            Check that all required dependencies are available
  -d, --download         Download the database
  -D, --db <PATH>        Path to the database [default: /home/mihall/.nohuman/db]
  -t, --threads <INT>    Number of threads to use in kraken2 [default: 1]
  -v, --verbose          Set the logging level to verbose
  -h, --help             Print help (see more with '--help')
  -V, --version          Print version

Full usage

$nohuman --help
Remove human reads from a sequencing run

Usage: nohuman [OPTIONS] [INPUT]...

Arguments:
  [INPUT]...
          Input file(s) to remove human reads from

Options:
  -o, --out1 <OUTPUT_1>
          First output file.

          Defaults to the name of the first input file with the suffix "nohuman" appended. e.g. "input_1.fastq.gz" -> "input_1.nohuman.fq". NOTE: kraken2 output cannot be compressed, so the output will always be uncompressed.

  -O, --out2 <OUTPUT_2>
          Second output file - if two input files given.

          Defaults to the name of the first input file with the suffix "nohuman" appended. e.g. "input_2.fastq.gz" -> "input_2.nohuman.fq". NOTE: kraken2 output cannot be compressed, so the output will always be uncompressed.

  -c, --check
          Check that all required dependencies are available

  -d, --download
          Download the database

  -D, --db <PATH>
          Path to the database

          [default: /home/mihall/.nohuman/db]

  -t, --threads <INT>
          Number of threads to use in kraken2

          [default: 1]

  -v, --verbose
          Set the logging level to verbose

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

Alternates

Hostile is an alignment-based approach that performs well. It take longer and uses more memory than the nohuman kraken approach, but has slightly better accuracy for Illumina data. See the paper for more details and for other alternate approaches.

Cite

DOI:10.1101/2023.09.18.558339

Hall, Michael B., and Lachlan J. M. Coin. “Pangenome Databases Provide Superior Host Removal and Mycobacteria Classification from Clinical Metagenomic Data.” bioRxiv, September 19, 2023. https://doi.org/10.1101/2023.09.18.558339.

@misc{hall_pangenome_2023,
	title = {Pangenome databases provide superior host removal and mycobacteria classification from clinical metagenomic data},
	url = {https://www.biorxiv.org/content/10.1101/2023.09.18.558339v3},
	doi = {10.1101/2023.09.18.558339},
	language = {en},
	urldate = {2023-09-20},
	publisher = {bioRxiv},
	author = {Hall, Michael B. and Coin, Lachlan J. M.},
	month = sep,
	year = {2023},
}

Dependencies

~17–33MB
~532K SLoC