3 releases

Uses old Rust 2015

0.1.2 Jun 27, 2018
0.1.1 Jun 20, 2018
0.1.0 Jun 20, 2018

#722 in Machine learning

46 downloads per month

MIT/Apache

38KB
891 lines

openml-rust

A rust interface to OpenML.

The aim of this crate is to give rust code access to Machine Learning data hosted by OpenML. Thus, Machine Learning algorithms developed in Rust can be easily applied to state-of-the-art data sets and their performance compared to existing implementations in a reproducable way.

Example

extern crate openml;

use openml::prelude::*;
use openml::{PredictiveAccuracy, SupervisedClassification};
use openml::baseline::NaiveBayesClassifier;

fn main() {
    // Load "Supervised Classification on iris" task (https://www.openml.org/t/59)
    let task = SupervisedClassification::from_openml(59).unwrap();

    println!("Task: {}", task.name());

    // run the task
    let result: PredictiveAccuracy<_> = task.run(|train, test| {
        // train classifier
        let nbc: NaiveBayesClassifier<u8> = train
            .map(|(x, y)| (x, y))
            .collect();

        // test classifier
        let y_out: Vec<_> = test
            .map(|x| nbc.predict(x))
            .collect();

        Box::new(y_out.into_iter())
    });

    println!("Classification Accuracy: {}", result.result());
}

Goals

  • get data sets
  • get tasks
  • get splits
  • task types
    • Supervised Classification
    • Supervised Regression
    • Learning Curve
    • Clustering
  • run tasks
    • [ ] Learner/Predictor trait for use with tasks
    • Task runner takes a closure for learning and prediction
    • Data type strategy:
      • a: burden the ML model with figuring out how to deal with dynamic types
      • b: cast everything to f64
      • c: make type casting part of the feature extraction pipeline
      • Generics allow type selection at compile time
  • make openml.org optional (manual construction of tasks)

Future Maybe-Goals

  • flow support
  • run support
  • full OpenML API support
  • authentication
  • more tasks
    • Supervised Datastream Classification
    • Machine Learning Challenge
    • Survival Analysis
    • Subgroup Discovery

Non-Goals

  • implementations of machine learning algorithms

Dependencies

~9–19MB
~239K SLoC