#bam #sam #bioinformatics #cli

app bam2seq

Extract reads and reconstructed references from a .bam file using CIGAR and MD tags

3 releases

0.1.2 Sep 10, 2022
0.1.1 Sep 10, 2022
0.1.0 Sep 10, 2022

#323 in Biology

MPL-2.0 license

7KB
100 lines

#+title: Bam2Seq

[https://crates.io/crates/bam2seq] [https://crates.io/crates/bam2seq]

This tool takes a BAM file containing CIGAR strings, reads, and MD tags, and outputs a .seq file containing pairs of reads and reconstructed references.

** Installation

Install directly with cargo from [https://crates.io/crates/bam2seq]: #+begin_src cargo install bam2seq #+end_src

Simply clone the repository, and optionally install the binary. #+begin_src git clone https://github.com/ragnargrootkoerkamp/bam2seq.git cd bam2seq cargo install --path . #+end_src

** Usage #+begin_src cargo run --release -- <input.bam> <output.seq> [--no-clip] [--min-len ] #+end_src

  • input.bam :: The input BAM file.
  • output.seq :: The output .seq file. Defaults to input.seq.
  • --no-clip :: Disable trimming of soft clipped regions from the read.
  • --min-len :: Only output (clipped) reads of at least this length.

This outputs a .seq file, which looks like this: #+begin_src

ACTGATGA <ACAGATG read 2 <reference 2 ... #+end_src

** Links

  • This is quite similar to [[https://github.com/mlafave/sam2pairwise][sam2pairwise]] but writes a simpler output format.
  • All the work in the implementation is done by the [[https://docs.rs/bam/latest/bam/][BAM]] crate.

Dependencies

~5–18MB
~206K SLoC