86 releases (20 breaking)

new 0.21.0 May 17, 2024
0.19.1 Apr 23, 2024
0.15.0 Mar 27, 2024
0.5.5 Dec 18, 2023
0.2.6 Jul 31, 2023

#105 in Biology

Download history 46/week @ 2024-01-19 299/week @ 2024-01-26 159/week @ 2024-02-02 42/week @ 2024-02-09 79/week @ 2024-02-16 607/week @ 2024-02-23 414/week @ 2024-03-01 598/week @ 2024-03-08 427/week @ 2024-03-15 212/week @ 2024-03-22 106/week @ 2024-03-29 133/week @ 2024-04-05 485/week @ 2024-04-12 1211/week @ 2024-04-19 195/week @ 2024-04-26 61/week @ 2024-05-03

1,968 downloads per month
Used in 3 crates

Apache-2.0

3MB
25K SLoC

Exon

Exon is an execution engine designed to work with bioinformatics data. It features:

  • SQL based access to bioinformatics data -- general DML and some DDL support
  • Support for many file formats from bioinformatics, proteomics, and others
  • Local filesystem and object storage support
  • Arrow FFI primitives for multi-language support

Installation

Exon is available via crates.io. To install, run:

cargo add exon

Documentation

  • Rust documentation is available here.
  • General documentation is available here.

Benchmarks

Please see the benchmarks README for more information.


lib.rs:

Exon is a library to facilitate open-ended analysis of scientific data, ease the application of ML models, and provide a common data interface for science and engineering teams.

Overview

The main interface for users is through datafusion's SessionContext plus the ExonSessionExt extension trait. This has a number of convenience methods for loading data from various sources.

See the read_* methods on ExonSessionExt for more information. For example, read_fasta, or read_gff. There's also a read_inferred_exon_table method that will attempt to infer the data type and compression from the file extension for ease of use.

To facilitate those methods, Exon implements a number of traits for DataFusion that serve as a good base for scientific data work. See the datasources module for more information.

Dependencies

~80MB
~1.5M SLoC