#nom #bioinformatics #gtf #parser #ensembl #file-io

gtftools

A barebones GTF toolkit with fast nom-based IO

6 releases

0.1.9 Jul 19, 2023
0.1.8 Jul 19, 2023
0.1.5 Nov 17, 2022
0.1.4 Sep 27, 2022

#303 in Biology


Used in 3 crates

MIT license

27KB
627 lines

gtftools

a crate for parsing and querying Ensembl-GTF formatted files.

Parser achieves near wc -l throughput.

Usage

This is meant to be used as an iterator and receives any item implementing BufRead.

From File

use std::{fs::File, io::BufReader};
use gtftools::GtfReader;

let handle = File::open("data/example.gtf")
  .map(BufReader::new)
  .unwrap();

let num_records = GtfReader::from_bufread(handle)
  .filter_map(|x| x.ok())
  .count();

assert_eq!(num_records, 10);

From Gzip File

use std::{fs::File, io::BufReader};
use flate2::read::MultiGzDecoder;
use gtftools::GtfReader;

let handle = File::open("data/example.gtf.gz")
  .map(MultiGzDecoder::new)
  .map(BufReader::new)
  .unwrap();

let num_records = GtfReader::from_bufread(handle)
  .filter_map(|x| x.ok())
  .count();

assert_eq!(num_records, 10);

Dependencies

~1.8–2.8MB
~56K SLoC