3 unstable releases
0.2.0 | Apr 10, 2024 |
---|---|
0.1.1 | Oct 8, 2023 |
0.1.0 | Oct 7, 2023 |
#1524 in Parser implementations
Used in nafcodec-py
69KB
1.5K
SLoC
π¦π§¬ nafcodec
Rust coder/decoder for Nucleotide Archive Format (NAF) files.
πΊοΈ Overview
Nucleotide Archive Format is a file format proposed in Kryukov et al.[1] in 2019 for storing compressed nucleotide or protein sequences combining 4-bit encoding and Zstandard compression. NAF files can be compressed and decompressed using the original C implementation.
This crate provides a Rust implementation of a NAF decoder, from scratch,
using nom
for parsing the binary format,
and zstd
for handling Zstandard
decompression. It provides a complete API that allows iterating over
the contents of a NAF file.
This is the Rust version, there is a Python package available as well.
π Features
- streaming decoder: The decoder is implemented using different readers each accessing a region of the compressed file, allowing to stream records without having to decode full blocks.
- optional decoding: Allow the decoder to skip the decoding of certains fields, such as ignoring quality strings when they are not needed.
- flexible encoder: The encoder is implemented using an abstract storage interface for temporary data, which allows to keep sequence in memory or inside a temporary folder.
π Usage
Use a Decoder
to iterate over the contents of a Nucleotide Archive Format,
reading from any BufRead
+
Seek
implementor:
let mut decoder = nafcodec::Decoder::from_path("../data/LuxC.naf")
.expect("failed to open nucleotide archive");
for result in decoder {
let record = result.unwrap();
// .. do something with the record .. //
}
All fields of the obtained Record
are optional, and actually depend on the kind of data that was compressed.
The decoder can be configured through a
DecoderBuilder
to ignore some fields to make decompression faster, even if they are present
in the source archive:
let mut decoder = nafcodec::DecoderBuilder::new()
.quality(false)
.with_path("../data/phix.naf")
.expect("failed to open nucleotide archive");
// the archive contains quality strings...
assert!(decoder.header().flags().test(nafcodec::Flag::Quality));
// ... but we configured the decoder to ignore them
for result in decoder {
let record = result.unwrap();
assert!(record.quality.is_none())
}
π Feedback
β οΈ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
π Changelog
This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.
βοΈ License
This library is provided under the open-source MIT license. The NAF specification is in the public domain.
This project is in no way not affiliated, sponsored, or otherwise endorsed by the original NAF authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.
π References
- Kirill Kryukov, Mahoko Takahashi Ueda, So Nakagawa, Tadashi Imanishi. "Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences". Bioinformatics, Volume 35, Issue 19, October 2019, Pages 3826β3828. doi:10.1093/bioinformatics/btz144
Dependencies
~5β14MB
~184K SLoC