4 stable releases
Uses new Rust 2024
1.2.0 | Mar 14, 2025 |
---|---|
1.1.0 | Mar 13, 2025 |
0.3.0 |
|
0.2.0 |
|
0.1.0 |
|
#87 in Machine learning
494 downloads per month
220KB
4K
SLoC
rs-stats
A comprehensive statistical library written in Rust, providing powerful tools for probability, distributions, and hypothesis testing.
rs-stats offers a broad range of statistical functionality implemented in pure Rust. It's designed to be intuitive, efficient, and reliable for both simple and complex statistical analysis. The library aims to provide a comprehensive set of tools for data scientists, researchers, and developers working with statistical models.
Features
-
Probability Functions
- Error functions (erf, erfc)
- Cumulative distribution functions
- Probability density functions
- Z-scores
- Basic statistics (mean, variance, standard deviation, standard error)
-
Statistical Distributions
- Normal (Gaussian) distribution
- Binomial distribution
- Exponential distribution
- Poisson distribution
- Uniform distribution
-
Regression Analysis
- Linear Regression (fit, predict, confidence intervals)
- Multiple Linear Regression (multiple predictor variables)
- Model statistics (R², adjusted R², standard error)
- Model persistence (save/load models in JSON or binary format)
-
Hypothesis Testing
- ANOVA (Analysis of Variance)
- Chi-square tests (independence and goodness of fit)
- T-tests (one-sample, two-sample, paired)
Installation
Add rs-stats to your Cargo.toml
:
[dependencies]
rs-stats = "1.2.0"
Or use cargo add:
cargo add rs-stats
Usage Examples
Basic Statistical Functions
use rs_stats::prob::{average, variance, population_std_dev, std_err};
fn main() {
let data = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let mean = average(&data);
let var = variance(&data);
let std_dev = population_std_dev(&data);
let std_error = std_err(&data);
println!("Mean: {}", mean);
println!("Variance: {}", var);
println!("Standard Deviation: {}", std_dev);
println!("Standard Error: {}", std_error);
}
Working with Distributions
use rs_stats::distributions::normal_distribution::{normal_pdf, normal_cdf, normal_inverse_cdf};
fn main() {
// Standard normal distribution (mean=0, std_dev=1)
let x = 1.96;
// Probability density at x
let density = normal_pdf(x, 0.0, 1.0);
println!("PDF at {}: {}", x, density);
// Cumulative probability P(X ≤ x)
let cumulative = normal_cdf(x, 0.0, 1.0);
println!("CDF at {}: {}", x, cumulative);
// Inverse CDF (quantile function)
let p = 0.975;
let quantile = normal_inverse_cdf(p, 0.0, 1.0);
println!("{}th percentile: {}", p * 100.0, quantile);
}
Hypothesis Testing
use rs_stats::hypothesis_tests::t_test::{one_sample_t_test, two_sample_t_test};
use rs_stats::hypothesis_tests::chi_square_test::{chi_square_goodness_of_fit, chi_square_independence};
use rs_stats::hypothesis_tests::anova::one_way_anova;
fn main() {
// One-sample t-test
let sample = vec![5.1, 5.2, 4.9, 5.0, 5.3];
let result = one_sample_t_test(&sample, 5.0);
println!("One-sample t-test p-value: {}", result.p_value);
// Two-sample t-test
let sample1 = vec![5.1, 5.2, 4.9, 5.0, 5.3];
let sample2 = vec![4.8, 4.9, 5.0, 4.7, 4.9];
let result = two_sample_t_test(&sample1, &sample2);
println!("Two-sample t-test p-value: {}", result.p_value);
// ANOVA
let groups = vec![
vec![5.1, 5.2, 4.9, 5.0, 5.3],
vec![4.8, 4.9, 5.0, 4.7, 4.9],
vec![5.2, 5.3, 5.1, 5.4, 5.2],
];
let result = one_way_anova(&groups);
println!("ANOVA p-value: {}", result.p_value);
// Chi-square test of independence
let observed = vec![
vec![45, 55],
vec![60, 40],
];
let result = chi_square_independence(&observed);
println!("Chi-square independence test p-value: {}", result.p_value);
}
Regression Analysis
use rs_stats::regression::linear_regression::LinearRegression;
use rs_stats::regression::multiple_linear_regression::MultipleLinearRegression;
fn main() {
// Simple Linear Regression
let x = vec![1.0, 2.0, 3.0, 4.0, 5.0];
let y = vec![2.0, 4.0, 6.0, 8.0, 10.0];
let mut model = LinearRegression::new();
model.fit(&x, &y).unwrap();
println!("Slope: {}", model.slope);
println!("Intercept: {}", model.intercept);
println!("R-squared: {}", model.r_squared);
// Predict new values
let prediction = model.predict(6.0);
println!("Prediction for x=6: {}", prediction);
// Calculate confidence interval (95%)
if let Some((lower, upper)) = model.confidence_interval(6.0, 0.95) {
println!("95% confidence interval: ({}, {})", lower, upper);
}
// Multiple Linear Regression
let x_multi = vec![
vec![1.0, 2.0], // observation 1: x1.2.0, x2=2.0
vec![2.0, 1.0], // observation 2: x1=2.0, x2=1.0
vec![3.0, 3.0], // observation 3: x1=3.0, x2=3.0
vec![4.0, 2.0], // observation 4: x1=4.0, x2=2.0
];
let y_multi = vec![9.0, 8.0, 16.0, 15.0];
let mut multi_model = MultipleLinearRegression::new();
multi_model.fit(&x_multi, &y_multi).unwrap();
println!("Coefficients: {:?}", multi_model.coefficients);
println!("R-squared: {}", multi_model.r_squared);
println!("Adjusted R-squared: {}", multi_model.adjusted_r_squared);
// Predict with multiple variables
let new_observation = vec![5.0, 4.0];
let prediction = multi_model.predict(&new_observation);
println!("Prediction for new observation: {}", prediction);
// Save model to file
multi_model.save("model.json").unwrap();
// Load model from file
let loaded_model = MultipleLinearRegression::load("model.json").unwrap();
}
Decision Trees
use rs_stats::regression::decision_tree::{DecisionTree, TreeType, SplitCriterion};
// Example 1: Regression Tree for Patient Recovery Time Prediction
let mut recovery_time_tree = DecisionTree::<f64, f64>::new(
TreeType::Regression,
SplitCriterion::Mse,
5, // max_depth
2, // min_samples_split
1 // min_samples_leaf
);
// Training data: [age, treatment_intensity, bmi, comorbidity_score, initial_severity]
let patient_features = vec![
vec![45.0, 3.0, 28.5, 2.0, 7.0], // Patient 1: 45 years, treatment intensity 3, BMI 28.5, etc.
vec![62.0, 4.0, 31.2, 3.0, 8.0], // Patient 2
vec![38.0, 2.0, 24.3, 1.0, 5.0], // Patient 3
// ... more patients
];
let recovery_days = vec![14.0, 28.0, 10.0]; // Recovery time in days
// Train the model to predict recovery time
recovery_time_tree.fit(&patient_features, &recovery_days);
// Make predictions for a new patient
let new_patient = vec![
vec![55.0, 3.0, 27.0, 2.0, 6.0], // New patient characteristics
];
let predicted_recovery_days = recovery_time_tree.predict(&new_patient);
// Example 2: Classification Tree for Diabetes Risk Assessment
let mut diabetes_risk_tree = DecisionTree::<u8, f64>::new(
TreeType::Classification,
SplitCriterion::Gini,
4, // max_depth
2, // min_samples_split
1 // min_samples_leaf
);
// Training data: [glucose_level, bmi, blood_pressure, age, family_history]
let medical_features = vec![
vec![85.0, 22.0, 120.0, 35.0, 0.0], // Patient 1: glucose 85 mg/dL, BMI 22, BP 120, etc.
vec![140.0, 31.0, 145.0, 52.0, 1.0], // Patient 2
vec![165.0, 34.0, 155.0, 48.0, 1.0], // Patient 3
// ... more patients
];
let diabetes_status = vec![0, 1, 1]; // 0: No diabetes, 1: Diabetes
// Train the classifier
diabetes_risk_tree.fit(&medical_features, &diabetes_status);
// Print tree structure and summary
println!("Tree Structure:\n{}", diabetes_risk_tree.tree_structure());
println!("Tree Summary:\n{}", diabetes_risk_tree.summary());
// Feature importance - which medical measurements are most predictive
let importance = diabetes_risk_tree.feature_importances();
println!("Feature Importance: {:?}", importance);
The Decision Tree implementation supports:
- Both regression and classification tasks
- Multiple split criteria (MSE, MAE for regression; Gini, Entropy for classification)
- Generic types with appropriate trait bounds
- Parallel processing for optimal performance
- Tree visualization and interpretation tools
- Feature importance calculation
Documentation
For detailed API documentation, run:
cargo doc --open
Testing
The library includes a comprehensive test suite. Run the tests with:
cargo test
Contributing
Contributions are welcome! Here's how you can contribute:
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-new-feature
- Commit your changes:
git commit -am 'Add some feature'
- Push to the branch:
git push origin feature/my-new-feature
- Submit a pull request
Before submitting your PR, please make sure:
- All tests pass
- Code follows the project's style and conventions
- New features include appropriate documentation and tests
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- The Rust community for their excellent documentation and support
- Contributors to the project
- Various statistical references and research papers that informed the implementations
Dependencies
~4–5.5MB
~105K SLoC