10 stable releases (3 major)

4.9.0 Oct 12, 2024
4.0.0 Jul 12, 2024
3.9.0 Jun 21, 2024
2.0.2 May 14, 2024
1.0.0 Apr 21, 2022

#234 in #bit


Used in 3 crates (via bio-seq)

MIT license

19KB
323 lines

bio-seq-derive

bio-seq-derive is a procedural macro crate that provides the Codec derive macro for the bio-seq library. It allows users to define custom bit-packed alphabets from an enum. The bit representation of the symbols is derived from the enum discriminants.

This crate also provides the dna!() and iupac!() macros that are reexported by bio-seq for declaring static sequences at compile time.

You probably don't want to directly include this crate as a dependency.

Please refer to the bio-seq documentation for a complete guide on defining custom alphabets.

Features

  • width attribute: Specify the number of bits required to represent each variant in the custom alphabet. Default is optimal.
  • alt attribute: Define alternate bit representations for the same variant.
  • display attribute: Set a custom character representation for a variant.

Usage

To derive a custom encoding, use the Codec derive macro as reexported in the bio-seq prelude:

use bio_seq::prelude::*;

Codecs can be annotated with #[repr(u8)] for convenient casting.

#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, Codec)]
#[width(6)]
pub enum Amino {
    #[alt(0b110110, 0b010110, 0b100110)]
    A = 0b000110, // GCA
    #[alt(0b111011)]
    C = 0b011011, // TGC
    #[alt(0b110010)]
    D = 0b010010, // GAC
    #[alt(0b100010)]
    E = 0b000010, // GAA
    #[alt(0b111111)]
    F = 0b011111, // TTC
    #[alt(0b101010, 0b011010, 0b111010)]
    G = 0b001010, // GGA
    #[alt(0b110001)]
    H = 0b010001, // CAC
    #[alt(0b011100, 0b111100)]
    I = 0b001100, // ATA
    #[alt(0b100000)]
    K = 0b000000, // AAA
    #[alt(0b001111, 0b101111, 0b111101, 0b011101, 0b101101)]
    L = 0b001101, // CTA
    M = 0b101100, // ATG
    #[alt(0b110000)]
    N = 0b010000, // AAC
    #[alt(0b010101, 0b100101, 0b110101)]
    P = 0b000101, // CCA
    #[alt(0b100001)]
    Q = 0b000001, // CAA
    #[alt(0b101000, 0b111001, 0b011001, 0b001001, 0b101001)]
    R = 0b001000, // AGA
    #[alt(0b110111, 0b010111, 0b000111, 0b100111, 0b111000)]
    S = 0b011000, // AGC
    #[alt(0b110100, 0b010100, 0b100100)]
    T = 0b000100, // ACA
    #[alt(0b011110, 0b111110, 0b101110)]
    V = 0b001110, // GTA
    W = 0b101011, // TGG
    #[alt(0b110011)]
    Y = 0b010011, // TAC
    #[display('*')]
    #[alt(0b001011, 0b100011)]
    X = 0b000011, // TAA (stop)
}

Dependencies

~220–660KB
~16K SLoC