#bioinformatics #genomics #plink #genotype #snps

bed-reader

Read and write the PLINK BED format, simply and efficiently

22 releases

0.2.34 May 2, 2023
0.2.33 May 1, 2023
0.2.29 Nov 1, 2022
0.2.28 Oct 20, 2022
0.2.14 May 28, 2022

#42 in Biology

Download history 9/week @ 2023-06-03 5/week @ 2023-06-10 37/week @ 2023-06-17 32/week @ 2023-06-24 13/week @ 2023-07-01 4/week @ 2023-07-08 6/week @ 2023-07-15 14/week @ 2023-07-22 8/week @ 2023-07-29 14/week @ 2023-08-05 9/week @ 2023-08-12 27/week @ 2023-08-19 53/week @ 2023-08-26 31/week @ 2023-09-02 18/week @ 2023-09-09 8/week @ 2023-09-16

112 downloads per month

Apache-2.0

2.5MB
6K SLoC

Rust 4.5K SLoC // 0.1% comments Python 1.5K SLoC // 0.4% comments Batch 8 SLoC INI 6 SLoC

Contains (Zip file, 13KB) some_missing.properties.npz

bed-reader

github crates.io docs.rs build status

Read and write the PLINK BED format, simply and efficiently.

Features

  • Fast and multi-threaded
  • Supports many indexing methods. Slice data by individuals (samples) and/or SNPs (variants).
  • The Python-facing APIs for this library is used by PySnpTools, FaST-LMM, and PyStatGen.
  • Supports PLINK 1.9.

Examples

Read all genotype data from a .bed file.

use ndarray as nd;
use bed_reader::{Bed, ReadOptions, assert_eq_nan, sample_bed_file};

let file_name = sample_bed_file("small.bed")?;
let mut bed = Bed::new(file_name)?;
let val = ReadOptions::builder().f64().read(&mut bed)?;

assert_eq_nan(
    &val,
    &nd::array![
        [1.0, 0.0, f64::NAN, 0.0],
        [2.0, 0.0, f64::NAN, 2.0],
        [0.0, 1.0, 2.0, 0.0]
    ],
);

Read every second individual (samples) and SNPs (variants) 20 to 30.

use ndarray::s;

let file_name = sample_bed_file("some_missing.bed")?;
let mut bed = Bed::new(file_name)?;
let val = ReadOptions::builder()
    .iid_index(s![..;2])
    .sid_index(20..30)
    .f64()
    .read(&mut bed)?;

assert!(val.dim() == (50, 10));

List the first 5 individual (sample) ids, the first 5 SNP (variant) ids, and every unique chromosome. Then, read every genomic value in chromosome 5.

use std::collections::HashSet;

let mut bed = Bed::new(file_name)?;
println!("{:?}", bed.iid()?.slice(s![..5])); // Outputs ndarray: ["iid_0", "iid_1", "iid_2", "iid_3", "iid_4"]
println!("{:?}", bed.sid()?.slice(s![..5])); // Outputs ndarray: ["sid_0", "sid_1", "sid_2", "sid_3", "sid_4"]
println!("{:?}", bed.chromosome()?.iter().collect::<HashSet<_>>());
// Outputs: {"12", "10", "4", "8", "19", "21", "9", "15", "6", "16", "13", "7", "17", "18", "1", "22", "11", "2", "20", "3", "5", "14"}
let val = ReadOptions::builder()
    .sid_index(bed.chromosome()?.map(|elem| elem == "5"))
    .f64()
    .read(&mut bed)?;

assert!(val.dim() == (100, 6));

Dependencies

~20–30MB
~422K SLoC