#b-tree #phylogenetic #canonical #matrix #distance #neighbor-joining #rapid-nj

bin+lib speedytree

Canonical and RapidNJ implementations of Neighbor-joining in Rust

1 unstable release

0.1.0 Mar 21, 2024

#2069 in Algorithms

Custom license

65KB
1.5K SLoC

speedytree

Speedytree is a Rust implementation of Neighbor-Joining for building phylogenetic trees from large Phylip distance matrices. There are two strategies: the Canonical algorithm (as QuickTree) and something in the spirit of RapidNJ but with B-trees.

You can read more about Neighbor-Joining here. The RapidNJ algorithm should be faster for very big problems at the cost of a larger memory overhead.

You may also find interesting the slides I made about this project.

Benchmark

You can run the benchmark with make benchmark. As an example, a small sth file is included, but you probably want to add your own. The alignments I used are included in the release with the binary executable file.

Testing

Speedytree has a test suite that can be run with cargo test. There are three types of tests:

  • Unit tests, which test individual functions.
  • Integration tests, which test the whole program with small matrices.
  • Property-based tests, which test the program with randomly additive binary trees. These tests rely on the fact that the program should output the same tree that it was given as input.

About the command line application

Installation

You can either download the binary from the releases page or build it yourself. To build it yourself you need to have Rust installed. Then you can run cargo build --release to build the binary. The binary will be located in target/release/speedytree.

Usage

Speedytree takes a Phylip distance matrix as input and outputs a Newick tree. It can be used like this:

speedytree < input.phy > output.nwk

Speedytree has a few options that can be used to tweak the output. You can see them by running speedytree --help. The most important options are:

  • -c to set the number of threads to use. By default it will use 1.
  • --naive to use the canonical implementation. This algorithm is equivalent to QuickTree, and it's fast in practice for small matrices.
  • --rapidnj to use the RapidNJ heuristics, but implemented with BTrees.
  • --hybrid to use a mix of the two algorithms.

Dependencies

~6–12MB
~133K SLoC