#bioinformatics #genomics #plink #genotype #snps

bed-reader

Read and write the PLINK BED format, simply and efficiently

14 releases

Uses new Rust 2021

new 0.2.23 Jun 30, 2022
0.2.21 Jun 22, 2022
0.2.14 May 28, 2022

#53 in Science

Download history 114/week @ 2022-05-24 178/week @ 2022-05-31 68/week @ 2022-06-07 61/week @ 2022-06-14 39/week @ 2022-06-21 24/week @ 2022-06-28

218 downloads per month

Apache-2.0

2.5MB
6K SLoC

Rust 4.5K SLoC // 0.1% comments Python 1.5K SLoC // 0.4% comments INI 6 SLoC

bed-reader

github crates.io docs.rs build status

Read and write the PLINK BED format, simply and efficiently.

Features

  • Fast and multi-threaded
  • Supports many indexing methods. Slice data by individuals (samples) and/or SNPs (variants).
  • The Python-facing APIs for this library is used by PySnpTools, FaST-LMM, and PyStatGen.
  • Supports PLINK 1.9.

Examples

Read all genotype data from a .bed file.

use ndarray as nd;
use bed_reader::{Bed, ReadOptions, assert_eq_nan, sample_bed_file};

let file_name = sample_bed_file("small.bed")?;
let mut bed = Bed::new(file_name)?;
let val = ReadOptions::builder().f64().read(&mut bed)?;

assert_eq_nan(
    &val,
    &nd::array![
        [1.0, 0.0, f64::NAN, 0.0],
        [2.0, 0.0, f64::NAN, 2.0],
        [0.0, 1.0, 2.0, 0.0]
    ],
);
# use bed_reader::BedErrorPlus; // '#' needed for doctest
# Ok::<(), BedErrorPlus>(())

Read every second individual (samples) and SNPs (variants) 20 to 30.

# // '#' needed for doctest
# use bed_reader::{Bed, ReadOptions, assert_eq_nan, sample_bed_file};
use ndarray::s;

let file_name = sample_bed_file("some_missing.bed")?;
let mut bed = Bed::new(file_name)?;
let val = ReadOptions::builder()
    .iid_index(s![..;2])
    .sid_index(20..30)
    .f64()
    .read(&mut bed)?;

assert!(val.dim() == (50, 10));
# use bed_reader::BedErrorPlus; // '#' needed for doctest
# Ok::<(), BedErrorPlus>(())

List the first 5 individual (sample) ids, the first 5 SNP (variant) ids, and every unique chromosome. Then, read every genomic value in chromosome 5.

# use ndarray::s; // '#' needed for doctest
# use bed_reader::{Bed, ReadOptions, assert_eq_nan, sample_bed_file};
# let file_name = sample_bed_file("some_missing.bed")?;
use std::collections::HashSet;

let mut bed = Bed::new(file_name)?;
println!("{:?}", bed.iid()?.slice(s![..5])); // Outputs ndarray: ["iid_0", "iid_1", "iid_2", "iid_3", "iid_4"]
println!("{:?}", bed.sid()?.slice(s![..5])); // Outputs ndarray: ["sid_0", "sid_1", "sid_2", "sid_3", "sid_4"]
println!("{:?}", bed.chromosome()?.iter().collect::<HashSet<_>>());
// Outputs: {"12", "10", "4", "8", "19", "21", "9", "15", "6", "16", "13", "7", "17", "18", "1", "22", "11", "2", "20", "3", "5", "14"}
let val = ReadOptions::builder()
    .sid_index(bed.chromosome()?.map(|elem| elem == "5"))
    .f64()
    .read(&mut bed)?;

assert!(val.dim() == (100, 6));
# use bed_reader::BedErrorPlus; // '#' needed for doctest
# Ok::<(), BedErrorPlus>(())

Project Links

Dependencies

~17–23MB
~391K SLoC