4 stable releases

Uses new Rust 2024

1.2.0	Mar 14, 2025
1.1.0	Mar 13, 2025
0.3.0	~~Sep 30, 2024~~
0.2.0	~~Sep 30, 2024~~
0.1.0	~~Apr 2, 2022~~

#225 in Algorithms

GPL-3.0 license

220KB
4K SLoC

rs-stats

A comprehensive statistical library written in Rust, providing powerful tools for probability, distributions, and hypothesis testing.

rs-stats offers a broad range of statistical functionality implemented in pure Rust. It's designed to be intuitive, efficient, and reliable for both simple and complex statistical analysis. The library aims to provide a comprehensive set of tools for data scientists, researchers, and developers working with statistical models.

Features

Probability Functions
- Error functions (erf, erfc)
- Cumulative distribution functions
- Probability density functions
- Z-scores
- Basic statistics (mean, variance, standard deviation, standard error)
Statistical Distributions
- Normal (Gaussian) distribution
- Binomial distribution
- Exponential distribution
- Poisson distribution
- Uniform distribution
Regression Analysis
- Linear Regression (fit, predict, confidence intervals)
- Multiple Linear Regression (multiple predictor variables)
- Model statistics (R², adjusted R², standard error)
- Model persistence (save/load models in JSON or binary format)
Hypothesis Testing
- ANOVA (Analysis of Variance)
- Chi-square tests (independence and goodness of fit)
- T-tests (one-sample, two-sample, paired)

Installation

Add rs-stats to your Cargo.toml:

[dependencies]
rs-stats = "1.2.0"

Or use cargo add:

cargo add rs-stats

Usage Examples

Basic Statistical Functions

use rs_stats::prob::{average, variance, population_std_dev, std_err};

fn main() {
    let data = vec![1.0, 2.0, 3.0, 4.0, 5.0];
    
    let mean = average(&data);
    let var = variance(&data);
    let std_dev = population_std_dev(&data);
    let std_error = std_err(&data);
    
    println!("Mean: {}", mean);
    println!("Variance: {}", var);
    println!("Standard Deviation: {}", std_dev);
    println!("Standard Error: {}", std_error);
}

Working with Distributions

use rs_stats::distributions::normal_distribution::{normal_pdf, normal_cdf, normal_inverse_cdf};

fn main() {
    // Standard normal distribution (mean=0, std_dev=1)
    let x = 1.96;
    
    // Probability density at x
    let density = normal_pdf(x, 0.0, 1.0);
    println!("PDF at {}: {}", x, density);
    
    // Cumulative probability P(X ≤ x)
    let cumulative = normal_cdf(x, 0.0, 1.0);
    println!("CDF at {}: {}", x, cumulative);
    
    // Inverse CDF (quantile function)
    let p = 0.975;
    let quantile = normal_inverse_cdf(p, 0.0, 1.0);
    println!("{}th percentile: {}", p * 100.0, quantile);
}

Hypothesis Testing

use rs_stats::hypothesis_tests::t_test::{one_sample_t_test, two_sample_t_test};
use rs_stats::hypothesis_tests::chi_square_test::{chi_square_goodness_of_fit, chi_square_independence};
use rs_stats::hypothesis_tests::anova::one_way_anova;

fn main() {
    // One-sample t-test
    let sample = vec![5.1, 5.2, 4.9, 5.0, 5.3];
    let result = one_sample_t_test(&sample, 5.0);
    println!("One-sample t-test p-value: {}", result.p_value);
    
    // Two-sample t-test
    let sample1 = vec![5.1, 5.2, 4.9, 5.0, 5.3];
    let sample2 = vec![4.8, 4.9, 5.0, 4.7, 4.9];
    let result = two_sample_t_test(&sample1, &sample2);
    println!("Two-sample t-test p-value: {}", result.p_value);
    
    // ANOVA
    let groups = vec![
        vec![5.1, 5.2, 4.9, 5.0, 5.3],
        vec![4.8, 4.9, 5.0, 4.7, 4.9],
        vec![5.2, 5.3, 5.1, 5.4, 5.2],
    ];
    let result = one_way_anova(&groups);
    println!("ANOVA p-value: {}", result.p_value);
    
    // Chi-square test of independence
    let observed = vec![
        vec![45, 55],
        vec![60, 40],
    ];
    let result = chi_square_independence(&observed);
    println!("Chi-square independence test p-value: {}", result.p_value);
}

Regression Analysis

use rs_stats::regression::linear_regression::LinearRegression;
use rs_stats::regression::multiple_linear_regression::MultipleLinearRegression;

fn main() {
    // Simple Linear Regression
    let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
    let y = vec![2.0, 4.0, 6.0, 8.0, 10.0];
    
    let mut model = LinearRegression::new();
    model.fit(&x, &y).unwrap();
    
    println!("Slope: {}", model.slope);
    println!("Intercept: {}", model.intercept);
    println!("R-squared: {}", model.r_squared);
    
    // Predict new values
    let prediction = model.predict(6.0);
    println!("Prediction for x=6: {}", prediction);
    
    // Calculate confidence interval (95%)
    if let Some((lower, upper)) = model.confidence_interval(6.0, 0.95) {
        println!("95% confidence interval: ({}, {})", lower, upper);
    }
    
    // Multiple Linear Regression
    let x_multi = vec![
        vec![1.0, 2.0], // observation 1: x1.2.0, x2=2.0
        vec![2.0, 1.0], // observation 2: x1=2.0, x2=1.0
        vec![3.0, 3.0], // observation 3: x1=3.0, x2=3.0
        vec![4.0, 2.0], // observation 4: x1=4.0, x2=2.0
    ];
    let y_multi = vec![9.0, 8.0, 16.0, 15.0];
    
    let mut multi_model = MultipleLinearRegression::new();
    multi_model.fit(&x_multi, &y_multi).unwrap();
    
    println!("Coefficients: {:?}", multi_model.coefficients);
    println!("R-squared: {}", multi_model.r_squared);
    println!("Adjusted R-squared: {}", multi_model.adjusted_r_squared);
    
    // Predict with multiple variables
    let new_observation = vec![5.0, 4.0];
    let prediction = multi_model.predict(&new_observation);
    println!("Prediction for new observation: {}", prediction);
    
    // Save model to file
    multi_model.save("model.json").unwrap();
    
    // Load model from file
    let loaded_model = MultipleLinearRegression::load("model.json").unwrap();
}

Decision Trees

use rs_stats::regression::decision_tree::{DecisionTree, TreeType, SplitCriterion};

// Example 1: Regression Tree for Patient Recovery Time Prediction
let mut recovery_time_tree = DecisionTree::<f64, f64>::new(
    TreeType::Regression,
    SplitCriterion::Mse,
    5,   // max_depth
    2,   // min_samples_split
    1    // min_samples_leaf
);

// Training data: [age, treatment_intensity, bmi, comorbidity_score, initial_severity]
let patient_features = vec![
    vec![45.0, 3.0, 28.5, 2.0, 7.0],  // Patient 1: 45 years, treatment intensity 3, BMI 28.5, etc.
    vec![62.0, 4.0, 31.2, 3.0, 8.0],  // Patient 2
    vec![38.0, 2.0, 24.3, 1.0, 5.0],  // Patient 3
    // ... more patients
];
let recovery_days = vec![14.0, 28.0, 10.0];  // Recovery time in days

// Train the model to predict recovery time
recovery_time_tree.fit(&patient_features, &recovery_days);

// Make predictions for a new patient
let new_patient = vec![
    vec![55.0, 3.0, 27.0, 2.0, 6.0],  // New patient characteristics
];
let predicted_recovery_days = recovery_time_tree.predict(&new_patient);

// Example 2: Classification Tree for Diabetes Risk Assessment
let mut diabetes_risk_tree = DecisionTree::<u8, f64>::new(
    TreeType::Classification,
    SplitCriterion::Gini,
    4,   // max_depth
    2,   // min_samples_split
    1    // min_samples_leaf
);

// Training data: [glucose_level, bmi, blood_pressure, age, family_history]
let medical_features = vec![
    vec![85.0, 22.0, 120.0, 35.0, 0.0],  // Patient 1: glucose 85 mg/dL, BMI 22, BP 120, etc.
    vec![140.0, 31.0, 145.0, 52.0, 1.0],  // Patient 2
    vec![165.0, 34.0, 155.0, 48.0, 1.0],  // Patient 3
    // ... more patients
];
let diabetes_status = vec![0, 1, 1];  // 0: No diabetes, 1: Diabetes

// Train the classifier
diabetes_risk_tree.fit(&medical_features, &diabetes_status);

// Print tree structure and summary
println!("Tree Structure:\n{}", diabetes_risk_tree.tree_structure());
println!("Tree Summary:\n{}", diabetes_risk_tree.summary());

// Feature importance - which medical measurements are most predictive
let importance = diabetes_risk_tree.feature_importances();
println!("Feature Importance: {:?}", importance);

The Decision Tree implementation supports:

Both regression and classification tasks
Multiple split criteria (MSE, MAE for regression; Gini, Entropy for classification)
Generic types with appropriate trait bounds
Parallel processing for optimal performance
Tree visualization and interpretation tools
Feature importance calculation

Documentation

For detailed API documentation, run:

cargo doc --open

Testing

The library includes a comprehensive test suite. Run the tests with:

cargo test

Contributing

Contributions are welcome! Here's how you can contribute:

Fork the repository
Create a feature branch: git checkout -b feature/my-new-feature
Commit your changes: git commit -am 'Add some feature'
Push to the branch: git push origin feature/my-new-feature
Submit a pull request

Before submitting your PR, please make sure:

All tests pass
Code follows the project's style and conventions
New features include appropriate documentation and tests

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

The Rust community for their excellent documentation and support
Contributors to the project
Various statistical references and research papers that informed the implementations

Dependencies

~5.5MB
~108K SLoC