Machine Learning in Rust
Learn the Rust programming language through implementing classic machine learning algorithms. This project is selfcompleted without relying on any thirdparty libraries, serving as a bootstrap machine learning library.
❗❗❗：Actively seeking code reviews and welcome suggestions on fixing bugs or code refactoring. Please feel free to share your ideas. Happy to accept advice!
Basics
 NdArray Module, just as the name. It has implemented
broadcast
,matrix operations
,permute
and etc. in arbitrary dimension. SIMD is used in matrix multiplication thanks to auto vectorizing by Rust.  Dataset Module, supporting customized loading data, reformat,
normalize
,shuffle
andDataloader
. Several popular dataset preprocessing recipes are available.
Algorithms
 Decision Tree, supporting both classification and regression tasks. Info gains like
gini
orentropy
are provided.  Logistic Regression, supporting regularization (
Lasso
,Ridge
andLinf
)  Linear Regression, same as logistic regression, but for regression tasks.
 Naive Bayes, free to handle discrete or continuous feature values.
 SVM, with linear kernel using SGD and Hinge Loss to optimize.
 nn Module, containing
linear(MLP)
and someactivation
functions which could be freely stacked and optimized by gradient back propagations.  KNN, supporting both
KdTree
and vanillaBruteForceSearch
.  KMeans, clustering data with an unsupervised learning approach
Start
Let's use KNN algorithm to solve a classification task. More examples can be found in examples
directory.

create some synthetic data for tests
use std::collections::HashMap; let features = vec![ vec![0.6, 0.7, 0.8], vec![0.7, 0.8, 0.9], vec![0.1, 0.2, 0.3], ]; let labels = vec![0, 0, 1]; // so it is a binary classifiction task, 0 is for the large label, 1 is for the small label let mut label_map = HashMap::new(); label_map.insert(0, "large".to_string()); label_map.insert(1, "small".to_string());

convert the data to the
dataset
use mlinrust::dataset::Dataset; let dataset = Dataset::new(features, labels, Some(label_map));

split the dataset into
train
andvalid
sets and normalize them by Standard normalizationlet mut temp = dataset.split_dataset(vec![2.0, 1.0], 0); // [2.0, 1.0] is the split fraction, 0 is the seed let (mut train_dataset, mut valid_dataset) = (temp.remove(0), temp.remove(0)); use mlinrust::dataset::utils::{normalize_dataset, ScalerType}; normalize_dataset(&mut train_dataset, ScalerType::Standard); normalize_dataset(&mut valid_dataset, ScalerType::Standard);

build and train our KNN model using
KdTree
use mlinrust::model::knn::{KNNAlg, KNNModel, KNNWeighting}; // KdTree is one implementation of KNN; 1 defines the k of neighbours; Weighting decides the way of ensemble prediction; train_dataset is for training KNN; Some(2) is the param of minkowski distance let model = KNNModel::new(KNNAlg::KdTree, 1, Some(KNNWeighting::Distance), train_dataset, Some(2));

evaluate the model
use mlinrust::utils::evaluate; let (correct, acc) = evaluate(&valid_dataset, &model); println!("evaluate results\ncorrect {correct} / total {}, acc = {acc:.5}", test_dataset.len());
Todo
 model weights serialization for saving and loading
 Boosting/bagging
 matrix multiplication with multi threads
 refactor codes, sincerely request for comments from senior developers
Reference
 scikitlearn
 The book, 机器学习西瓜书 by Prof. Zhihua Zhou
Thanks
The rust community. I received many help from rustlang Discord.
License
Under GPLv3 license. And commercial use is strictly prohibited.