#correction #statistics #testing #single-statistics #holm-bonferroni #benjamini-yekutieli

single-statistics

A specialized Rust library for statistical analysis of single-cell data, part of the single-rust ecosystem

1 unstable release

Uses new Rust 2024

new 0.2.0 Apr 29, 2025

#187 in Machine learning

Custom license

58KB
998 lines

single-statistics

Crates.io Documentation License: MIT

A specialized Rust library for statistical analysis of single-cell data, part of the single-rust ecosystem.

Overview

single-statistics provides robust statistical methods for biological analysis of single-cell data, focusing on differential expression analysis, marker gene identification, and related statistical tests. This crate builds on the foundations provided by single-algebra while implementing biologically-relevant statistical approaches optimized for sparse single-cell data.

Features

  • Differential Expression Analysis

    • Parametric tests (Student's t-test, Welch's t-test)
    • Non-parametric tests (Mann-Whitney U test)
    • Effect size calculations
    • Parallel implementation for performance
  • Multiple Testing Correction

    • Bonferroni correction
    • Benjamini-Hochberg (FDR)
    • Benjamini-Yekutieli
    • Holm-Bonferroni
    • Storey's q-value
  • Statistical Framework

    • Generic interfaces for statistical tests
    • Support for sparse matrix representations
    • Type-safe operations via traits

Getting Started

Add the crate to your Cargo.toml:

[dependencies]
single-statistics = "0.1.0"

Example Usage

use nalgebra_sparse::CsrMatrix;
use single_statistics::testing::{Alternative, MatrixStatTests, TestMethod, TTestType};

fn main() -> anyhow::Result<()> {
    // Create or load your expression matrix (genes x cells)
    let expression_matrix: CsrMatrix<f64> = // ...

    // Define groups (e.g., cell types, conditions)
    let group_ids = vec![0, 0, 0, 1, 1, 1];

    // Run differential expression analysis
    let results = expression_matrix.differential_expression(
        &group_ids,
        TestMethod::TTest(TTestType::Welch)
    )?;

    // Get significantly differentially expressed genes
    let significant_genes = results.significant_indices(0.05);
    println!("Found {} significant genes", significant_genes.len());

    // Access statistics, p-values, and effect sizes
    if let Some(effect_sizes) = &results.effect_sizes {
        for (i, &gene_idx) in significant_genes.iter().enumerate() {
            println!(
                "Gene {}: statistic = {}, p-value = {}, adjusted p-value = {}, effect size = {}",
                gene_idx,
                results.statistics[gene_idx],
                results.p_values[gene_idx],
                results.adjusted_p_values.as_ref().unwrap()[gene_idx],
                effect_sizes[i]
            );
        }
    }

    Ok(())
}

Integration with the single-rust Ecosystem

single-statistics is designed to work seamlessly with other components of the single-rust ecosystem:

  • single-algebra: Core algebraic operations for single-cell data
  • single-clustering: Algorithms for clustering cells
  • single-utilities: Common utilities for the ecosystem

Scope

This crate focuses specifically on statistics related to differential expression and marker gene identification. It implements robust, efficient algorithms optimized for sparse data, providing statistical foundations for higher-level analyses in the single-cell domain.

Features in scope:

  • Statistical tests relevant to single-cell RNA-seq analysis
  • Implementations of various hypothesis testing methods
  • Multiple testing correction
  • Effect size calculations

Features out of scope (available in other crates):

  • General matrix statistics (in single-algebra)
  • Basic QC metrics computation (in single-algebra)
  • Plotting/visualization
  • Clustering algorithms (in single-clustering)
  • Batch correction

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the BSD 3-Clause License - see the LICENSE.md file for details.

Dependencies

~7.5MB
~150K SLoC