#data-fusion #biology #arrow

exon

A platform for scientific data processing and analysis

107 releases (30 breaking)

0.32.4 Dec 21, 2024
0.32.2 Sep 4, 2024
0.29.1 Jul 16, 2024
0.15.0 Mar 27, 2024
0.2.6 Jul 31, 2023

#90 in Biology

45 downloads per month
Used in 3 crates

Apache-2.0

4MB
27K SLoC

Exon is a library to facilitate open-ended analysis of scientific data, ease the application of ML models, and provide a common data interface for science and engineering teams.

Overview

The main interface for users is through datafusion's SessionContext plus the ExonSessionExt extension trait. This has a number of convenience methods for loading data from various sources.

See the read_* methods on ExonSessionExt for more information. For example, read_fasta, or read_gff. There's also a read_inferred_exon_table method that will attempt to infer the data type and compression from the file extension for ease of use.

To facilitate those methods, Exon implements a number of traits for DataFusion that serve as a good base for scientific data work. See the datasources module for more information.


Exon

Exon is an execution engine designed to work with bioinformatics data. It features:

  • SQL based access to bioinformatics data -- general DML and some DDL support
  • Support for many file formats from bioinformatics, proteomics, and others
  • Local filesystem and object storage support
  • Arrow FFI primitives for multi-language support

Installation

Exon is available via crates.io. To install, run:

cargo add exon

Documentation

  • Rust documentation is available here.
  • General documentation is available here.

Benchmarks

Please see the benchmarks README for more information.

Dependencies

~88–120MB
~2M SLoC