#bam #fastq #fasta #long-read #science-bioinformatics

app bamsalvage

Rust version of bamsalvage, retrieving sequences from a corrupted BAM file as much as possible

1 unstable release

0.1.3 Apr 18, 2023

#5 in #long-read

MIT license

33KB
485 lines

bamsalvage, Rust version

Rust version of bamsalvage.

INTRODUCTION

bamsalvage is a tools to recover sequence reads as much as possible from possibly corrupted BAM files. This software share the common purpose with bamrescue by Jérémie Roquet (https://bamrescue.arkanosis.net/). bamrescue detects corrupted BGZF block using CRC32 checksums and skip corrupted blocks and the method works well if all blocks begin with new reads.

When we would like to recover long-read sequences, a read can span more than one BGZF blocks since the maximum block size is less than sequencer outputs.

Skipping corrupted blocks does not solve such the troubles and often results in termination of Samtools and failure of sequence recovery.

bamsalvage scans next available start positions when any corrupted blocks are detected. Since the goal of the software is rescuing sequences, bamsalvage do not recover all information included in BAM file but retrieves reads and qual sequences.

Install

The program requires rustc and cargo (version >= 1.6). All resources will be downloaded and using following commands.

git clone https://github.com/takaho/bamsalvage-rust/
cargo build

##Usage cargo run --release -- -i [BAM file] -o [output file] [--noqual] [--verbose] or using binary inside target directory bamsalvage -i [BAM file] -o [output file] [--noqual] [--verbose]

##Commands

Options:
  -i, --input <FILE>     Input BAM file
  -o, --output <FILE>    Output filename
  -l, --limit <integer>  Limiting counts [default: 0]
  -n, --noqual           Skip qual field
  -v, --verbose          verbosity
  -h, --help             Print help
  -V, --version          Print version

Dependencies

~8MB
~115K SLoC