45 releases

0.1.11 Sep 29, 2023
0.1.10 Sep 20, 2023
0.1.7 Aug 18, 2023
0.0.46 Aug 14, 2023
0.0.14 Sep 30, 2022

#57 in Biology


Used in 2 crates

MIT license

360KB
7.5K SLoC

bedrs

MIT licensed actions status codecov Crates.io docs.rs

bedtools-like functionality for interval sets in rust

Summary

This is an interval library written in rust that takes advantage of the trait system, generics, and monomorphization.

It focuses around two main traits: Coordinates and Container which when implemented on an arbitrary type allow for a wide range of genomic interval arithmetic.

Interval arithmetic can be thought of as set theoretic operations (like intersection, union, difference, complement, etc.) on intervals with associated chromosomes, strands, and other genomic markers.

This library facilitates the development of these types of operations on arbitrary types and lets the user tailor their structures to minimize overhead.

Usage

The main benefit of this library is that it is trait-based. So you can define your own types - but if they implement the Coordinates trait they can use the other functions within the library.

For detailed usage and examples please review the documentation.

Coordinates Trait

The library centers around the Coordinates trait.

The ChromBounds and ValueBounds are the minimal trait requirements for all the types that can be used as the chromosome and interval values.

pub trait Coordinates<C, T>
where
    C: ChromBounds,
    T: ValueBounds,
{
    fn start(&self) -> T;
    fn end(&self) -> T;
    fn chr(&self) -> &C;
    fn update_start(&mut self, val: &T);
    fn update_end(&mut self, val: &T);
    fn update_chr(&mut self, val: &C);
    fn from(other: &Self) -> Self;
}

This is so that if you would like to implement your own interval type you will only need to implement the Coordinates trait for your type and you can use all the functionality of the library.

// define a custom interval struct for testing
struct CustomInterval {
    left: usize,
    right: usize,
}
impl Coordinates<usize> for CustomInterval {
    fn start(&self) -> usize {
        self.left
    }
    fn end(&self) -> usize {
        self.right
    }
    fn chr(&self) -> &usize {
        &0
    }
    fn update_start(&mut self, val: &usize) {
        self.left = *val;
    }
    fn update_end(&mut self, val: &usize) {
        self.right = *val;
    }
    fn from(other: &Self) -> Self {
        Self {
            left: other.start(),
            right: other.end(),
        }
    }
}

Interval Types

There are some base interval types provided however, which you can use for reference or directly for your use case.

Base Interval

This is a straightforward singular interval type. It still implements the chr() method, but will return the default of its generic type.

use bedrs::{Overlap, Interval};

let a = Interval::new(10, 20);
let b = Interval::new(15, 25);
assert!(a.overlaps(&b));

Genomic Interval

This is the bread and butter of genomic arithmetic. It is a 3-attribute struct of [chr, start, stop].

use bedrs::{Overlap, GenomicInterval};

// Initializing two intervals on the same Chr
let a = GenomicInterval::new(1, 10, 20);
let b = GenomicInterval::new(1, 15, 25);
assert!(a.overlaps(&b));

// Initializing two intervals on different Chr
let a = GenomicInterval::new(1, 10, 20);
let b = GenomicInterval::new(2, 15, 25);
assert!(!a.overlaps(&b));

Stranded Genomic Interval

This is another version of the genomic interval which includes strand information. It is a 4-attribute struct of [chr, start, stop, strand]

use bedrs::{Overlap, Strand, StrandedGenomicInterval};

// Initializing three intervals on the same Chr with strands
let a = StrandedGenomicInterval::new(1, 10, 20, Strand::Forward);
let b = StrandedGenomicInterval::new(1, 15, 25, Strand::Forward);
let c = StrandedGenomicInterval::new(1, 15, 25, Strand::Reverse);

// All intervals overlap
assert!(a.overlaps(&b));
assert!(a.overlaps(&c));

// Only `a` and `b` overlap on the same strand
assert!(a.stranded_overlaps(&b));
assert!(!a.stranded_overlaps(&c));

Other Work

This library is heavily inspired by other interval libraries in rust which are listed below:

It also was motivated by the following interval toolkits in C++ and C respectively:

Dependencies

~1–1.9MB
~39K SLoC