16 releases

0.5.3 Mar 1, 2024
0.5.1 Jun 13, 2023
0.5.0 Aug 23, 2022
0.3.1 Jun 20, 2022
0.2.6 Nov 16, 2021

#410 in Parser implementations

Download history 352/week @ 2024-02-26 20/week @ 2024-03-04 33/week @ 2024-03-11 85/week @ 2024-04-01

120 downloads per month

MIT license

21KB
380 lines

Crates.io Crates.io docs.rs

kseq

kseq is a simple fasta/fastq (fastx) format parser library for Rust, its main function is to iterate over the records from fastx files (similar to kseq in C). It uses shared buffer to read and store records, so the speed is very fast. It supports a plain or gz fastx file or io::stdin, as well as a fofn (file-of-file-names) file, which contains multiple plain or gz fastx files (one per line).

Using kseq is very simple. Users only need to call parse_path to parse a path or parse_reader to parse a reader, and then use iter_record method to get each record.

  • parse_path This function takes a path that implements AsRef<std::path::Path> as input, a path can be a fastx file, - for io::stdin, or a fofn file. It returns a Result type:

    • Ok(T): A struct T with the iter_record method.
    • Err(E): An error E including missing input, can't open or read, wrong fastx format or invalid path or file errors.
  • parse_reader This function takes a reader that implements std::io::Read as input. It returns a Result type:

    • Ok(T): A struct T with the iter_record method.
    • Err(E): An error E including missing input, can't open or read, wrong fastx format or invalid path or file errors.
  • iter_record This function can be called in a loop, it returns a Result<Option<Record>> type:

    • Ok(Some(Record)): A struct Record with methods:

      • head -> &str: get sequence id/identifier
      • seq -> &str: get sequence
      • des -> &str: get sequence description/comment
      • sep -> &str: get separator
      • qual -> &str: get quality scores
      • len -> usize: get sequence length

      Note: call des, sep and qual will return "" if Record doesn't have these attributes.

    • Ok(None): Stream has reached EOF.

    • Err(ParseError): An error ParseError including IO, TruncateFile, InvalidFasta or InvalidFastq errors.

Example

use std::env::args;
use std::fs::File;
use kseq::parse_path;

fn main(){
	let path: String = args().nth(1).unwrap();
	let mut records = parse_path(path).unwrap();
	// let mut records = parse_reader(File::open(path).unwrap()).unwrap();
	while let Some(record) = records.iter_record().unwrap() {
		println!("head:{} des:{} seq:{} qual:{} len:{}", 
			record.head(), record.des(), record.seq(), 
			record.qual(), record.len());
	}
}

Installation

cargo add kseq

Benchmarking

cargo bench

Dependencies

~5MB
~70K SLoC