1 unstable release
0.1.0 | Apr 27, 2024 |
---|
#163 in Biology
145KB
1K
SLoC
Ambigviz
ambigviz is a tool for rapidly scanning and visualising ambiguous/mixed bases at given positions in a BAM file. It was initially written to examine intrahost diversity / co-infection in SARS-CoV-2 samples however it can be used for any BAM file (and scales well thanks to Rust). It uses strict filtering options by default to avoid sequencing artifacts and contamination/sequencing errors, The idea is to rapidly produce plots that are "presentation-ready" for further communication and discussion.
It provides a simple command line interface and requires at minimum a BAM file only.
Installation
Cargo:
Requires cargo
cargo install ambigviz
Build from source:
Install rust toolchain:
To install please refer to the rust documentation: docs
Clone the repository:
git clone https://github.com/Sam-Sims/ambigviz
Build and add to path:
cd ambigviz
cargo build --release
export PATH=$PATH:$(pwd)/target/release
All executables will be in the directory ambigviz/target/release.
Usage
Basic usage:
ambigviz ambig <path_to_bam> <region> [options]
At minimum all you need is a BAM file. Ambigviz will look for a .bai
file in the same directory with the
pattern [input].bai
- If one
cant be found ambigviz will attempt to index the BAM file for you. If this fails you can index the BAM file yourself
using
samtools.
By default, if no region is provided the entire BAM file will be scanned for ambiguous bases. Combined with the default threshold of 20% ambiguity, this provides an easy way to quickly scan a BAM file for sequencing errors, contaimination or co-infection and flag regions of interest for further investigation.
Regions follow the samtools format: chr:start-end
and all positions are 1-based.
The --bed
option allows you to output identified ambiguous positions to a bed file.
You can also plot the depth of a BAM file using the depth command.
ambigviz depth <path_to_bam> <region> [options]
Options:
Output
-o, --output <output>
This option will set the output file name. Default is to output the chromasome name.
Threshold
-t, --threshold <threshold>
This option will set the threshold for the proportion of ambiguous bases at a given position. Any position exceeding this treshold will be plotted. The maximum value is 0.5 (50%). Default is 0.2 (20%).
Indels
--no-indel
By default indels are included in the ambiguous base count. This option will exclude them. This is useful for nosiy data like ONT.
Depth
-D, --depth <depth>
This option will set the minimum total depth for a position to be included in the plot. Default is 100
Minor depth.
-d, --minor-depth <minor-depth>
This option will set the minimum depth of the minor allele for a position to be included in the plot. Default is 20
Base quality
-q, --base-quality <base-quality>
This option will set the minimum base quality for a base to be included in the plot. Default is 20
Mapping quality
-Q, --mapping-quality <mapping-quality>
This option will set the minimum mapping quality for a read to be included in the plot. Default is 60
Strand bias
-s, --strand-bias <strand-bias>
This option will set the strand bias threshold for a position to be included in the plot. Default is 0.1 (10%). This means that at least 10% of the depth must come from either strand. For example if you had a position that had 80 reads of A and 20 reads of C, there must be at least 2 from each strand in C in order for the position to be flagged as ambiguous and included in the plot. The maximum value is 0.5, requiring that there is an equal amount of depth from each strand. Setting to 0 will disable this filter.
Bed
--bed
This option will output the identified ambiguous positions to a bed file. The default is to not output.
Labels
--no-labels
By default the proportion of each base for each position are included as text annotations. This option will exclude them.
Dependencies
~20–32MB
~487K SLoC