#pre-processor #machine-learning

greenglas

Data Preprocessing library for Machine Learning

2 unstable releases

0.3.0 Nov 21, 2021
0.2.0 Jun 16, 2017

#895 in Machine learning

MIT/Apache

15KB
186 lines

Greenglas • Join the chat at https://gitter.im/spearow/juice Greenglas Build Status Crates.io dependency status License

Greenglas tries to provide a smart and customizable pipeline for preprocessing data for machine learning tasks. Clean preprocessing methods for the most common type of data, makes preprocessing easy. Greenglas offers a pipeline of Modifiers and Transformers to turn non-numeric data into a safe and consistent numeric output in the form of Coaster's SharedTensor. For putting your preprocessed data to use, you might like to use the Machine Learning Framework Leaf.

For more information see the Documentation.

Architecture

Greenglas exposes several standard data types, which might need a numeric transformation in order to be processed by a Machine Learning Algorithm such as Neural Nets.

Data Types can be modified through Modifiers. This provides a coherent interface, allowing for custom modifiers. You can read more about custom modifiers further down. First, an example of a Data Type modification:

let mut data_type = Image { value: ... }
data_type = data_type.set((ModifierOne(param1, param2), ModifierTwo(anotherParam));
image.set(Resize(20, 20))

After one, none or many modifications through Modifiers, the Data Type can then finally be transformed into a SharedTensor (numeric Vector). Taking data_type from the above example:

// the Vector secures the correct shape and capacity of the final SharedTensor
let final_tensor = data_type.transform(vec![20, 20, 3]).unwrap();

Transformable Data Types

These are the data types that greenglas is currently addressing. For most of them are basic Modifiers and Transformers already specified.

  • Missing: NULL data
  • Label: labeled data such as ID's, Categories, etc.
  • Word: a String of arbitrary lengths
  • Image
  • Audio

Modifiers

All Modifiers implement the Modifier trait from rust-modifier. As all Transformable Data Types implement the Set trait of the same library, one can easily write custom modifiers as well. Quick Example:

extern crate greenglas;

use greenglas::Image;
use greenglas::modifier::Modifier;

struct CustomModifier(usize)

impl Modifier<Image> for CustomModifier {
    fn modify(self, image: &mut Image) {
        image.value = some_extern_image_manipulation_fn(self.0);
    }
}

Contributing

Want to contribute? Awesome! We have instructions to help you get started contributing code or documentation.

Development happens in a mostly async collaboration culture and happens on the Gitter Channels or via github issues. Or you reach out to the Maintainer(s). e.g. {@drahnr, }.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as below, without any additional terms or conditions.

License

Licensed under either of

at your option.

Dependencies

~6.5MB
~94K SLoC