8 releases

new 0.2.3 May 7, 2025
0.2.2 May 2, 2025
0.1.3 Mar 7, 2025
0.1.1 Jan 27, 2025

#248 in Biology

Download history 55/week @ 2025-01-17 149/week @ 2025-01-24 19/week @ 2025-01-31 3/week @ 2025-02-07 2/week @ 2025-02-14 1/week @ 2025-02-21 113/week @ 2025-02-28 152/week @ 2025-03-07 6/week @ 2025-03-14 1/week @ 2025-04-11 451/week @ 2025-05-02

452 downloads per month

Apache-2.0

3MB
9K SLoC

sketchlib.rust

Cargo Build & Test Clippy check docs.rs codecov Crates.io GitHub release (latest SemVer)

Description

This is a reimplementation and extension of pp-sketchlib in the rust language. This version is optimised for larger sample numbers, particularly allowing subsets of samples to be compared.

v0.2.0 is the first stable release. We intend to keep the file format unchanged after this point so sketch libraries will not need to be rebuilt.

Citation

No preprint or paper yet, but we rely on algorithms from:

bindash (written by XiaoFei Zhao):
Zhao, X. BinDash, software for fast genome distance estimation on a typical personal laptop.
Bioinformatics 35:671–673 (2019).
doi:10.1093/bioinformatics/bty651

ntHash (written by Hamid Mohamadi):
Mohamadi, H., Chu, J., Vandervalk, B. P. & Birol, I. ntHash: recursive nucleotide hashing.
Bioinformatics 32:3492–3494 (2016).
doi:10.1093/bioinformatics/btw397

Documentation

See https://docs.rs/sketchlib

Installation

Choose from:

  1. Download a binary from the releases.
  2. Use cargo install sketchlib or cargo add sketchlib.
  3. Build from source

For 2) or 3) you must have the rust toolchain installed.

OS X users

If you have an M1-4 (arm64) Mac, we aren't currently automatically building binaries, so would recommend either option 2) or 3) for best performance.

If you get a message saying the binary isn't signed by Apple and can't be run, use the following command to bypass this:

xattr -d "com.apple.quarantine" ./sketchlib

Build from source

  1. Clone the repository with git clone.
  2. Run cargo install --path . or RUSTFLAGS="-C target-cpu=native" cargo install --path . to optimise for your machine.

Dependencies

~11–25MB
~292K SLoC