3 releases
0.3.4-beta.9 | Oct 27, 2023 |
---|
#11 in #proteomics
7KB
Exon is an analysis toolkit for life-science applications. It features:
- Support for many file formats from bioinformatics, proteomics, and others
- Local filesystem and object storage support
- Arrow FFI primitives for multi-language support
- SQL based access to bioinformatics data -- general DML and some DDL support
Please note Exon was recently excised from a larger library, so please be patient as we work to clean up after that. If you have a comment or question in the meantime, please file an issue.
Installation
Exon is available via crates.io. To install, run:
cargo add exon
Usage
Exon is designed to be used as a library. For example, to read a FASTA file:
use exon::context::ExonSessionExt;
use datafusion::prelude::*;
use datafusion::error::Result;
let ctx = SessionContext::new_exon();
let df = ctx.read_fasta("test-data/datasources/fasta/test.fasta", None).await?;
Please see the rust docs for more information.
File Formats
Format | Compression(s) | Inferred Extension(s) |
---|---|---|
BAM | - | .bam |
BCF | - | .bcf |
BED | gz, zstd | .bed |
FASTA | gz, zstd | .fasta, .fa, .fna |
FASTQ | gz, zstd | .fastq, .fq |
GENBANK | gz, zstd | .gbk, .genbank, .gb |
GFF | gz, zstd | .gff |
GTF | gz, zstd | .gtf |
HMMDOMTAB | gz, zstd | .hmmdomtab |
MZML | gz, zstd | .mzml[^2] |
SAM | - | .sam |
VCF | gz[^1] | .vcf |
[^1]: Uses bgzip not gzip. [^2]: mzML also works.
Related Projects
Settings
Exon using the following settings:
Setting | Default | Description |
---|---|---|
exon.vcf_parse_info |
true |
Parse VCF INFO fields. If False, INFO fields will be returned as a single string. |
exon.vcf_parse_formats |
true |
Parse VCF FORMAT fields. If False, FORMAT fields will be returned as a single string. |
You can update the settings by running:
SET <setting> = <value>;
For example, to disable parsing of VCF INFO fields:
SET exon.vcf_parse_info = false;
Benchmarks
Please see the benchmarks README for more information.
Dependencies
~61MB
~1M SLoC