#bam #biology #genomics #bioinformatics #visualization

app ambigviz

Identify and plot ambiguous nucleotide bases at given positions from a BAM file

1 unstable release

0.1.0 Apr 27, 2024

#26 in Biology

Download history 109/week @ 2024-04-22 25/week @ 2024-04-29

134 downloads per month

MIT license

145KB
1K SLoC

Ambigviz

ambigviz is a tool for rapidly scanning and visualising ambiguous/mixed bases at given positions in a BAM file. It was initially written to examine intrahost diversity / co-infection in SARS-CoV-2 samples however it can be used for any BAM file (and scales well thanks to Rust). It uses strict filtering options by default to avoid sequencing artifacts and contamination/sequencing errors, The idea is to rapidly produce plots that are "presentation-ready" for further communication and discussion.

It provides a simple command line interface and requires at minimum a BAM file only.

example

Installation

Cargo:

Requires cargo

cargo install ambigviz

Build from source:

Install rust toolchain:

To install please refer to the rust documentation: docs

Clone the repository:

git clone https://github.com/Sam-Sims/ambigviz

Build and add to path:

cd ambigviz
cargo build --release
export PATH=$PATH:$(pwd)/target/release

All executables will be in the directory ambigviz/target/release.

Usage

Basic usage:

ambigviz ambig <path_to_bam> <region> [options]

At minimum all you need is a BAM file. Ambigviz will look for a .bai file in the same directory with the pattern [input].bai - If one cant be found ambigviz will attempt to index the BAM file for you. If this fails you can index the BAM file yourself using samtools.

By default, if no region is provided the entire BAM file will be scanned for ambiguous bases. Combined with the default threshold of 20% ambiguity, this provides an easy way to quickly scan a BAM file for sequencing errors, contaimination or co-infection and flag regions of interest for further investigation.

Regions follow the samtools format: chr:start-end and all positions are 1-based.

The --bed option allows you to output identified ambiguous positions to a bed file.

You can also plot the depth of a BAM file using the depth command.

ambigviz depth <path_to_bam> <region> [options]

Options:

Output

-o, --output <output>

This option will set the output file name. Default is to output the chromasome name.

Threshold

-t, --threshold <threshold>

This option will set the threshold for the proportion of ambiguous bases at a given position. Any position exceeding this treshold will be plotted. The maximum value is 0.5 (50%). Default is 0.2 (20%).

Indels

--no-indel

By default indels are included in the ambiguous base count. This option will exclude them. This is useful for nosiy data like ONT.

Depth

-D, --depth <depth>

This option will set the minimum total depth for a position to be included in the plot. Default is 100

Minor depth.

-d, --minor-depth <minor-depth>

This option will set the minimum depth of the minor allele for a position to be included in the plot. Default is 20

Base quality

-q, --base-quality <base-quality>

This option will set the minimum base quality for a base to be included in the plot. Default is 20

Mapping quality

-Q, --mapping-quality <mapping-quality>

This option will set the minimum mapping quality for a read to be included in the plot. Default is 60

Strand bias

-s, --strand-bias <strand-bias>

This option will set the strand bias threshold for a position to be included in the plot. Default is 0.1 (10%). This means that at least 10% of the depth must come from either strand. For example if you had a position that had 80 reads of A and 20 reads of C, there must be at least 2 from each strand in C in order for the position to be flagged as ambiguous and included in the plot. The maximum value is 0.5, requiring that there is an equal amount of depth from each strand. Setting to 0 will disable this filter.

Bed

--bed

This option will output the identified ambiguous positions to a bed file. The default is to not output.

Labels

--no-labels

By default the proportion of each base for each position are included as text annotations. This option will exclude them.

Dependencies

~24–36MB
~545K SLoC