9 releases
new 0.2.8 | Jan 7, 2025 |
---|---|
0.2.6 | Dec 16, 2024 |
0.1.1 | Dec 4, 2024 |
#11 in Biology
515 downloads per month
6MB
3.5K
SLoC
[!CAUTION] This tool is still experimental!
This means it may have bugs, and features are subject to change. Use it cautiously, and share feedback to help us improve. 🧪
Predictosaurus is a command-line tool designed for uncertainty-aware haplotype-based genomic variant effect prediction. It provides comprehensive functionality for building variant graphs, processing genomic features, and extracting peptide sequences. The tool integrates various bioinformatics processes to support efficient data analysis and visualization.
Table of Contents
Installation
To install Predictosaurus, you can install it via Bioconda:
conda install -c bioconda predictosaurus
Alternatively, you can use cargo
, the Rust package manager:
cargo install predictosaurus
Usage
Run the tool from the command line using the following syntax:
predictosaurus <command> [options]
Use predictosaurus --help
to view general help information, or predictosaurus <command> --help
for specific command details.
Commands
Build
Builds a full variant graph from VCF files and stores it.
Options:
--calls <path>
: Path to the VCF calls file.--observations <sample=observations.vcf>
: One or more observation files; ensure sample names match those in the calls file.--min-prob-present <float>
: Minimum probability for a variant to be considered for the graph generation. Defaults to 0.8.--output <path>
: Path to store the generated variant graphs.
Example:
predictosaurus build --calls path/to/calls.vcf --observations sample1=path/to/observations1.vcf sample2=path/to/observations2.vcf --min-prob-present 0.65 --output path/to/output/graphs.duckdb
Process
Retrieves subgraphs for individual features from the provided GFF file.
Options:
--features <path>
: Path to the GFF file containing the features of interest.--reference <path>
: Path to the reference genome FASTA file.--graph <path>
: Path to the graph file generated by the build command.--output <path>
: Path to the output file storing the generated paths.
Example:
predictosaurus process --features path/to/features.gff --reference path/to/reference.fasta --graph path/to/graph.duckdb --output path/to/output/paths.duckdb
Plot
Creates visualizations and outputs them in HTML, TSV, or Vega format.
Options:
--input <path>
: Path to the input data file generated with the process command.--format <html|tsv|vega>
: Desired output format.--output <path>
: Path to the output files.
Example:
predictosaurus plot --input path/to/paths.duckdb --format html --output /out_dir/
Example
This is an example of using Predictosaurus to build a graph, process it, and visualize the results:
# Step 1: Build the variant graph
predictosaurus build --calls calls.vcf --observations sample1=observations1.vcf sample2=observations2.vcf --output graphs.duckdb
# Step 2: Process the graph with a GFF file
predictosaurus process --features features.gff --reference reference.fasta --graph graphs.duckdb --output paths.duckdb
# Step 3: plot visualizations
predictosaurus plot --input paths.duckdb --format html --output out_dir/
License
Predictosaurus is licensed under the MIT License.
Dependencies
~82MB
~1.5M SLoC