#sam #bam #bioinformatics #gff #vcf

cyanea-io

File format parsing for the Cyanea bioinformatics ecosystem

1 unstable release

0.1.0 Feb 21, 2026

#1376 in Parser implementations

Apache-2.0

1MB
23K SLoC

cyanea-io

Unified file format parsing for bioinformatics. Each parser is behind a feature flag to keep the dependency tree minimal.

What's Inside

  • CSV -- metadata extraction, preview
  • VCF -- variant parsing into cyanea-omics::Variant, streaming stats
  • BED -- BED3-BED6 records, interval conversion, statistics
  • BEDPE -- paired-end intervals with inter/intra-chromosomal stats
  • GFF3 -- hierarchical Gene/Transcript/Exon assembly with coordinate conversion
  • GTF -- GFF2 format with gene_id/transcript_id hierarchy
  • SAM -- full record parsing, flag helpers, paired-end stats, pileup generation
  • BAM -- BGZF-compressed binary alignment parsing
  • CRAM -- reference-based alignment format via noodles
  • BCF -- BCF2.1 binary VCF parsing
  • BLAST -- tabular output (-outfmt 6/7) parsing
  • MAF -- Multiple Alignment Format (UCSC/LAST/minimap2)
  • GenBank -- flat file parsing (multi-record, features table)
  • bigWig/bigBed -- Kent binary formats with B+ tree and R-tree index
  • Parquet -- columnar storage for variants, intervals, and expression matrices with predicate pushdown

Quick Start

[dependencies]
cyanea-io = { version = "0.1", features = ["vcf", "bed", "sam"] }
use cyanea_io::{vcf::vcf_stats, bed::parse_bed, sam::parse_sam};

let stats = vcf_stats("variants.vcf").unwrap();
println!("{} variants, {} SNVs", stats.variant_count, stats.snv_count);

let regions = parse_bed("regions.bed").unwrap();
let alignments = parse_sam("reads.sam").unwrap();

Feature Flags

Flag Default Description
csv Yes CSV parsing (csv, serde, serde_json)
vcf No VCF variant parsing (requires cyanea-omics)
bed No BED interval parsing (requires cyanea-omics)
gff No GFF3 gene structure parsing (requires cyanea-omics)
gtf No GTF (GFF2) parsing (requires cyanea-omics)
sam No SAM text alignment parsing
bam No BAM binary parsing (implies sam, adds flate2)
cram No CRAM format (implies sam, adds noodles)
bcf No BCF binary VCF (implies vcf, adds flate2)
blast No BLAST tabular output parsing
maf No MAF alignment format parsing
genbank No GenBank flat file parsing
bigwig No bigWig/bigBed binary format (adds flate2)
parquet No Apache Parquet (implies vcf + bed, adds arrow/parquet)
variant-calling No Variant calling (implies sam + vcf)
parallel No Rayon parallelism
wasm No WASM target marker

Modules

Module Feature Description
csv csv CSV metadata and preview
vcf vcf VCF parsing and statistics
vcf_header vcf Structured VCF 4.3 header construction/parsing
vcf_ops vcf Normalization, multi-allelic split/join, filtering, set ops
indexed_vcf vcf Random-access VCF via tabix index (noodles)
bed bed BED3-BED6 record parsing
bedpe bed BEDPE paired-end intervals
bedgraph bed bedGraph/Wiggle signal tracks
gff gff GFF3 hierarchical parsing
gtf gtf GTF (GFF2) parsing
sam sam SAM records, flags, paired stats
pileup sam Pileup generation, mpileup output
bgzf bam BGZF block decompression, virtual offsets
bam bam BAM binary alignment parsing
bam_ops bam Sort, merge, mark duplicates, flagstat, depth
indexed_bam bam Random-access BAM via BAI/CSI index (noodles)
cram cram CRAM format via noodles
bcf bcf BCF2.1 binary VCF reader
bcf_write bcf BCF2 binary VCF writer
variant_call variant-calling Bayesian genotype caller from pileup
blast blast BLAST tabular output (-outfmt 6/7)
blast_xml blast BLAST XML output (-outfmt 5)
maf maf Multiple Alignment Format
genbank genbank GenBank flat file parsing
embl genbank EMBL/ENA format parsing
stockholm genbank Stockholm MSA format (Pfam/Rfam/HMMER)
clustal genbank ClustalW/Omega format
phylip genbank PHYLIP interleaved/sequential format
pir genbank PIR/NBRF protein format
abi genbank ABI Sanger chromatogram binary
gfa genbank GFA v1 sequence graph format
bigwig bigwig bigWig/bigBed Kent binary format
parquet parquet Apache Parquet columnar format
fetch fetch URL builders for NCBI/UniProt/KEGG/htsget/refget

See Also

Dependencies

~1.9–9MB
~159K SLoC