#bioinformatics #metagenomics #variant-calling #strain-anlaysis

bin+lib lorikeet-genome

Strain resolver and variant caller via local reassembly for metagenomics

1 unstable release

0.8.2 Dec 6, 2023

#131 in Biology

GPL-3.0 license

2MB
37K SLoC

Lorikeet

Lorikeet is a within-species variant analysis pipeline for metagenomic communities that utilizes both long and short reads. Lorikeet utilizes a re-implementaion of the GATK HaplotypeCaller algorithm, performing local re-assembly of potentially active regions within candidate genomes. Called variants can be clustered into likely strains using a combination of UMAP and HDBSCAN.

Documentation

For detailed documentation of Lorikeet and the various algorithms and concepts it touches on please visit the Lorikeet Docs

Quick Start

Installation

Option 1: Build manually

You may need to manually set the paths for C_INCLUDE_PATH, LIBRARY_PATH, LIBCLANG_PATH, and OPENSSL_DIR to their corresponding paths in the your conda environment if they can't properly be found on your system. This method also assumes you have previously installed rust via rustup on your system.

GIT_LFS_SKIP_SMUDGE=1 git clone --recursive https://github.com/rhysnewell/Lorikeet.git;
cd Lorikeet;
conda env create -n lorikeet -f lorikeet.yml; 
conda activate lorikeet;
pip install --upgrade cmake;
bash install.sh # or run without installing e.g. `cargo run --release -- call -h`;
lorikeet call -h

Depending on your local network configuration, you may have problems obtaining Lorikeet via git. If you see something like this you may be behind a proxy that blocks access to standard git:// port (9418).

$ git clone --recursive git://github.com/rhysnewell/Lorikeet.git
Cloning into 'Lorikeet'...
fatal: Unable to look up github.com (port 9418) (Name or service not known)

Luckily, thanks to this handy tip from the developer of Freebayes we can work around it. If you have access to https:// on port 443, then you can use this 'magic' command as a workaround to enable download of the submodules:

git config --global url.https://github.com/.insteadOf git://github.com/

Option 2: Conda

Install into current conda environment:

conda install -c conda-forge bioconda lorikeet-genome

Create fresh conda environment and install lorikeet there:

conda create -n lorikeet -c conda-forge bioconda lorikeet-genome && \
conda activate lorikeet

The static binary is the easiest to use, however it is compiled with musl which for some reason makes Lorikeet (and other rust binaries) perform much slower than they usually should. As such, we do not recommend using this static binary until this problem is sorted out.

You can make use of the precompiled static binaries that come with this repository. You will have to install the lorikeet conda environment using the lorikeet.yml.

GIT_LFS_SKIP_SMUDGE=1 git clone --recursive https://github.com/rhysnewell/Lorikeet.git;
cd Lorikeet;
conda env create -n lorikeet -f lorikeet.yml;
conda activate lorikeet

Once you have created the conda environment download and install the latest release file from github

wget https://github.com/rhysnewell/Lorikeet/releases/download/latest/lorikeet-x86_64-unknown-linux-musl-v0.6.1.tar.gz;
tar -xvzf lorikeet-x86_64-unknown-linux-musl-v*.tar.gz;
cp release/lorikeet $CONDA_PREFIX/bin;

Usage

Input can either be reads and reference genome, or MAG. Or a BAM file and associated genome.

Strain genotyping analysis for metagenomics

Usage: lorikeet <subcommand> ...

Main subcommands:
    genotype    *Experimental* Resolve strain-level genotypes of MAGs from microbial communities
    consensus   Creates consensus genomes for each input reference and for each sample
    call        Performs variant calling with no downstream analysis
    evolve      Calculate dN/dS values for genes from read mappings

Other options:
    -V, --version   Print version information

Rhys J. P. Newell <r.newell near uq.edu.au>

Call variants from bam:

lorikeet call --bam-files my.bam --longread-bam-files my-longread.bam --genome-fasta-directory genomes/ -x fna --bam-file-cache-directory saved_bam_files --output-directory lorikeet_out/ --threads 10 --plot

Call variants from short reads and longread bam files:

lorikeet call -r input_genome.fna -1 forward_reads.fastq -2 reverse_reads.fastq -l longread.bam

Shell completion

Completion scripts for various shells e.g. BASH can be generated. For example, to install the bash completion script system-wide (this requires root privileges):

lorikeet shell-completion --output-file lorikeet --shell bash
mv lorikeet /etc/bash_completion.d/

It can also be installed into a user's home directory (root privileges not required):

lorikeet shell-completion --shell bash --output-file /dev/stdout >>~/.bash_completion

In both cases, to take effect, the terminal will likely need to be restarted. To test, type lorikeet ca and it should complete after pressing the TAB key.

License

Code is GPL-3.0

Dependencies

~36–49MB
~832K SLoC