4 releases (2 stable)
1.1.0 | Feb 28, 2024 |
---|---|
1.0.0 | Feb 17, 2022 |
0.3.7 | Aug 25, 2021 |
0.3.3 | Aug 7, 2019 |
#120 in Science
195KB
4K
SLoC
UMGAP - Unipept Metagenomics Analysis Pipeline
The Unipept Metagenomics Analysis Pipeline can analyse metagenomic samples and return a frequency table of the taxons it detected for each read. It is based on the Unipept Metaproteomics Analysis Pipeline. Both tools were developed at the Department of Applied Maths, Computer science and Statistics at Ghent University.
Installation & Setup
-
Install Rust, according to their installation instructions, or use your favourite package manager (e.g.
apt install rustc
). The pipeline is developed for the latest stable release, but should work on 1.35 and higher. -
Clone this repository and go to the repository root.
git clone https://github.com/unipept/umgap.git cd umgap
-
Compile and install the UMGAP.
cargo build --release cargo install --path .
For a multiuser installation, instead of
cargo install
, useinstall
to place the umgap program and the wrapper script were all users can reach it:sudo install target/release/umgap scripts/umgap-analyse.sh /usr/bin
cargo install
will install theumgap
command to~/.cargo/bin
by default. Please ensure this directory is in your$PATH
. You can check if the installation was succesful by asking for the version:umgap -V
-
(optional) Install FragGeneScanPlusPlus to use as gene predictor in the pipeline.
-
Run
scripts/umgap-setup.sh
to interactively configure the UMGAP and download the data files required for some steps of the pipeline.Depending on which type of analysis you are planning, you will need the tryptic index file (less powerfull, but runs on any decent laptop) and the 9-mer index file (uses about 100GB disk space for storage and as much RAM during operation. The exact size depends on the version.)
Run
sudo scripts/umgap-setup.sh -c /etc/umgap -d <datamap>
instead to share the datafiles between users. Make sure the<datamap>
is accessible for the end users. -
(optional) Analyze some test data! Running
./scripts/umgap-analyse.sh -1 testdata/A1.fq -2 testdata/A2.fq -t tryptic-sensitivity -o - | tee output.fa
should show you a FASTA-like file with a taxon id per header. If you didn't download the tryptic index file but the 9-mer index file, use instead:
./scripts/umgap-analyse.sh -1 testdata/A1.fq -2 testdata/A2.fq -o - | tee output.fa
-
(optional) NOT YET INTEGRATED - Visualize some test data! Running
./scripts/umgap-visualize.sh output.fa output.html
will give you an HTML-file, which will show you a visualization of the test data in your favorite browser.
Updating
A source install can be updated by pulling the repository to get the latest changes and running in the repository root:
cargo install --force --path .
Usage
The UMGAP offers individual tools which integrate into a pipeline. Running
umgap help
will get you to the documentation of each tool, and the short
metagenomics casestudy at the Unipept website displays their usage.
This repository also offers 6 preconfigured pipelines
(scripts/umgap-analyse.sh
)
which should cover most usecases. Running the script without any arguments
should get you started. The preconfigured pipelines are:
high-precision
: the default, focusses on high precision with very decent sensitivity.max-precision
: focusses on very high precision, at the cost of sensitivity.high-sensitivity
: focusses on high sensitivity, with decent precision.max-sensitivity
: focusses on very high sensitivity, at the cost of precision.tryptic-precision
: focusses on high precision, using a much smaller index file, which makes it usable on a laptop.tryptic-sensitivity
: focusses on high sensitivity, using a much smaller index file, which makes it usable on a laptop.
Another script, scripts/umgap-visualize.sh
will
help you to visualize the output of the pipeline. Again, running the script
without any arguments prints the usage instructions.
Contributing
Please adhere to the editorconfig and RustFMT styles specified.
License
The UMGAP is released under the terms of the MIT License. See the LICENSE file for more info.
Dependencies
~14–22MB
~298K SLoC