12 releases (6 breaking)
| 0.7.0 | Mar 18, 2026 |
|---|---|
| 0.5.2 | Mar 3, 2026 |
| 0.5.1 | Dec 10, 2025 |
| 0.5.0 | Nov 18, 2025 |
| 0.2.0 | Jul 30, 2023 |
#91 in Biology
615KB
15K
SLoC
oxbow
The core Rust library for oxbow.
Warning: oxbow is under active development. APIs are not yet stable and are subject to change.
Installation
To use oxbow in your Rust project, add oxbow to your Cargo.toml or run:
cargo add oxbow
Development
Ensure you have Rust installed on your system. You can install Rust using rustup.
Building the project
The oxbow Rust crate alone can be built using cargo.
cd oxbow
cargo build # --release (for non-debug build)
Linting and formatting
We use the standard Rust toolchain for linting and formatting Rust code.
Clippy is a Rust linter:
cargo clippy
The following command formats all source files of the current crate using rustfmt:
cargo fmt
Running Tests
To run tests on Rust code, we use cargo:
cargo test
lib.rs:
oxbow
oxbow reads genomic data formats 🧬 as Apache Arrow 🏹.
With the oxbow Rust library, you can serialize native formats into Arrow IPC , stream larger-than-memory files as Arrow RecordBatches with zero-copy over FFI, and more!
⚠️ The Rust API is under active development and is not yet stable. The API may change in future releases.
Features
- 🚀 Supports commonly used file formats from the htslib/GA4GH and the UCSC ecosystems.
- 🔍 Support for compression, indexing, column projection, and genomic range querying.
- 🔧 Support for nested fields and complex, typed schemas (e.g., SAM tags,
VCF
INFOandFORMATfields, AutoSql, etc.).
Scanners
The main interface to read files are the scanners. Each scanner is a parser for a specific
format and provides scanning methods that return an iterator implementing the
arrow::record_batch::RecordBatchReader trait.
Sequence formats
Alignment formats
sam: Scan SAM files as Arrow RecordBatches.bam: Scan BAM files as Arrow RecordBatches.cram: Scan CRAM files as Arrow RecordBatches.
Variant formats
Interval feature formats
bed: Scan BED files as Arrow RecordBatches.gtf: Scan GXF files as Arrow RecordBatches.gff: Scan GFF files as Arrow RecordBatches.
UCSC Big Binary Indexed (BBI) formats
bigbed: Scan BigBed files as Arrow RecordBatches.bigwig: Scan BigWig files as Arrow RecordBatches.BBI zoom: Scan zoom level summary statistics from BigWig/BigBed as Arrow RecordBatches.
License
Licensed under MIT or Apache-2.0.
Dependencies
~38MB
~556K SLoC