32 releases (5 breaking)
new 0.6.0 | May 5, 2025 |
---|---|
0.4.0-alpha.1 | Apr 29, 2025 |
0.2.3-alpha.3 | Mar 27, 2025 |
0.1.1-alpha.5 | Dec 23, 2024 |
0.1.1-alpha.1 | Nov 10, 2024 |
#278 in Math
426 downloads per month
Used in single_rust
150KB
3.5K
SLoC
single-algebra 🧮
A powerful linear algebra and machine learning utilities library for Rust, providing efficient matrix operations, dimensionality reduction, and statistical analysis tools.
Features 🚀
- Efficient Matrix Operations: Support for both dense and sparse matrices (CSR/CSC formats)
- Dimensionality Reduction: PCA implementations for both dense and sparse matrices
- SVD Implementations: Multiple SVD backends including LAPACK and Faer
- Statistical Analysis: Comprehensive statistical operations with batch processing support
- Similarity Measures: Collection of distance/similarity metrics for high-dimensional data
- Masking Support: Selective data processing with boolean masks
- Parallel Processing: Efficient multi-threaded implementations using Rayon
- Feature-Rich: Configurable through feature flags for specific needs
Matrix Operations 📊
- SVD Decomposition: Choose between parallel, LAPACK, or Faer implementations
- Sparse Matrix Support: Comprehensive operations for CSR and CSC sparse matrix formats
- Masked Operations: Selective data processing with boolean masks
- Batch Processing: Statistical operations grouped by batch identifiers
- Normalization: Row and column normalization with customizable targets
Dimensionality Reduction ⬇️
- PCA Framework: Flexible implementation with customizable SVD backends
- Dense Matrix PCA: Optimized implementation for dense matrices
- Sparse Matrix PCA: Memory-efficient PCA for sparse matrices
- Masked Sparse PCA: Apply PCA on selected features only
- Incremental Processing: Support for large datasets that don't fit in memory
Similarity Measures 📏
- Cosine Similarity: Measure similarity based on the cosine of the angle between vectors
- Euclidean Similarity: Similarity based on Euclidean distance
- Pearson Similarity: Measure linear correlation between vectors
- Manhattan Similarity: Similarity based on Manhattan distance
- Jaccard Similarity: Measure similarity as intersection over union
Statistical Analysis 📈
- Basic Statistics: Mean, variance, sum, min/max operations
- Batch Statistics: Compute statistics grouped by batch identifiers
- Matrix Variance: Efficient variance calculations for matrices
- Nonzero Counting: Count non-zero elements in sparse matrices
- Masked Statistics: Compute statistics on selected rows/columns only
Installation
Add this to your Cargo.toml
:
[dependencies]
single-algebra = "0.5.0"
Feature Flags
Enable optional features based on your needs:
[dependencies]
single-algebra = { version = "0.5.0", features = ["lapack", "faer"] }
Available features:
smartcore
: Enable integration with the SmartCore machine learning librarylapack
: Use the LAPACK backend for linear algebra operationsfaer
: Use the Faer backend for linear algebra operationssimba
: Enable SIMD optimizations via simba
Usage Examples
Basic PCA with LAPACK Backend
use ndarray::{Array2, ArrayView2};
use single_algebra::dimred::pca::dense::{PCABuilder, LapackSVD};
// Create a sample matrix
let data = array![[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]];
// Build PCA with LAPACK backend
let mut pca = PCABuilder::new(LapackSVD)
.n_components(2)
.center(true)
.scale(false)
.build();
// Fit and transform data
pca.fit(data.view()).unwrap();
let transformed = pca.transform(data.view()).unwrap();
// Access results
let components = pca.components().unwrap();
let explained_variance = pca.explained_variance_ratio().unwrap();
Sparse Matrix Operations
use nalgebra_sparse::{CooMatrix, CsrMatrix};
use single_algebra::sparse::MatrixSum;
// Create a sparse matrix
let mut coo = CooMatrix::new(3, 3);
coo.push(0, 0, 1.0);
coo.push(1, 1, 2.0);
coo.push(2, 2, 3.0);
let csr: CsrMatrix<f64> = (&coo).into();
// Calculate column sums
let col_sums: Vec<f64> = csr.sum_col().unwrap();
Batch Processing
use nalgebra_sparse::CsrMatrix;
use single_algebra::sparse::BatchMatrixMean;
// Sample data with batch identifiers
let matrix = create_sparse_matrix();
let batches = vec!["batch1", "batch1", "batch2", "batch2", "batch3"];
// Calculate mean per batch
let batch_means = matrix.mean_batch_col(&batches).unwrap();
// Access results for a specific batch
let batch1_means = batch_means.get("batch1").unwrap();
Similarity Measures
use ndarray::Array1;
use single_algebra::similarity::{SimilarityMeasure, CosineSimilarity};
let a = Array1::from_vec(vec![1.0, 2.0, 3.0]);
let b = Array1::from_vec(vec![4.0, 5.0, 6.0]);
let cosine = CosineSimilarity;
let similarity = cosine.calculate(a.view(), b.view());
Performance Considerations
- For large matrices, consider using sparse representations (CSR/CSC)
- Enable the appropriate backend (
lapack
orfaer
) based on your needs - Use masked operations when working with subsets of data
- Batch processing can significantly improve performance for grouped operations
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the BSD 3-Clause License - see the LICENSE.md file for details.
Acknowledgments
- The LAPACK integration is built upon the
nalgebra-lapack
crate - Some components are inspired by scikit-learn's implementations
- The Faer backend leverages the high-performance
faer
crate
Dependencies
~8–21MB
~354K SLoC