2 releases

0.1.1 Sep 12, 2024
0.1.0 Sep 11, 2024

#256 in Biology


Used in abpoa-rs

MIT license

655KB
6K SLoC

C 5.5K SLoC // 0.1% comments Rust 114 SLoC // 0.1% comments

abPOA Rust Bindings

Adaptive-band partial order alignment in Rust

 

Installation

Cargo package

TODO

Building from source

Rust compiler

The minimum supported Rust version is 1.80.

Building abPOA

  1. Clone the repository.

    git clone https://github.com/broadinstitute/abpoa-rs
    
  2. Move into the directory.

    cd abpoa-rs
    
  3. Build using cargo. We enable a flag to ensure the compiler uses all features of your machine's CPU. To maximize portability of the binary, however, remove the RUSTFLAGS="..." part.

    RUSTFLAGS="-C target-cpu=native" cargo build --release
    

Supported features

  • Global, local, and semi-global alignment
  • Configuring linear, gap-affine, and convex aligment penalties
  • Compututing one or more consensus sequences
  • Generating row-column MSA FASTA output
  • Importing a POA graph from a FASTA file

Features not yet supported:

  • "Strand ambiguous" alignment (on the roadmap)
  • Guide-tree supported alignment
  • Minimizer-based seeding and alignment

Usage

// Configure the alignment parameters
let aln_params = AlignmentParametersBuilder::new()
    .alignment_mode(AlignmentMode::Global)
    .gap_affine_penalties(0, 4, 6, 2)
    .verbosity(Verbosity::None)
    .build();

// Create a new empty POA graph
let mut graph = Graph::new(&aln_params);

let test_seq: Vec<&[u8]> = vec![
    b"ACGTGTACAGTTGAC",
    b"AGGTACACGTTAC",
    b"AGTGTCACGTTGAC",
    b"ACGTGTACATTGAC",
];

// Align and add each sequence to the graph
for (i, seq) in test_seq.iter().enumerate() {
    let weights = vec![1; seq.len()];
    let result = graph
        .align_and_add_sequence(&aln_params, seq, &weights, format!("seq{}", i + 1).as_bytes())
        .unwrap();
    
    eprintln!("Sequence {}: score = {}", i + 1, result.get_best_score());
}

// Compute the row-column MSA output
graph.generate_rc_msa();
let msa = graph.get_msa();
assert_eq!(msa.len(), 4);

let truth = [
    b"ACGTGTACAGTTGAC",
    b"A--GGTACACGTTAC",
    b"A-GTGTCACGTTGAC",
    b"ACGTGTACA-TTGAC",
];

for (i, seq) in msa.sequences().iter().enumerate() {
    // The sequence in the MSA object is coded, so we use `reverse_seq` to convert it to ASCII.
    let ascii = aln_params.reverse_seq(seq);
    assert_eq!(&ascii, truth[i]);
}

// Generate consensus
graph.generate_consensus(ConsensusAlgorithm::HeaviestBundle);

let consensus = graph.get_consensus().unwrap();
let ascii = aln_params.reverse_seq(consensus.sequences().iter().next().unwrap());

eprintln!("Consensus: {}", std::str::from_utf8(ascii).unwrap());

Dependencies

~0.8–3MB
~60K SLoC