9 releases
Uses new Rust 2024
| 0.2.4 | Feb 27, 2026 |
|---|---|
| 0.2.3 | Feb 27, 2026 |
| 0.1.3 | Jan 18, 2026 |
#349 in Biology
Used in 2 crates
550KB
9K
SLoC
PeacoQC-RS
PeacoQC-RS is a Rust implementation of PeacoQC (Peak-based Quality Control) algorithms for flow cytometry data. This library provides efficient, trait-based quality control methods that work with any FCS data structure through a simple trait interface.
Core Features
- Peak Detection: Automatic peak detection using kernel density estimation
- Isolation Forest: Outlier detection using isolation tree method
- MAD Outlier Detection: Median Absolute Deviation-based outlier identification
- Margin Event Removal: Detection and removal of margin events
- Doublet Detection: Identification of doublet/multiplet events
- Monotonic Channel Detection: Detection of channels with monotonic trends (indicating technical issues)
- Consecutive Bins Filtering: Removal of short consecutive regions
- Trait-Based Design: Works with any data structure via
PeacoQCDatatrait
Installation
Add this to your Cargo.toml:
[dependencies]
peacoqc-rs = { path = "../peacoqc-rs", version = "0.2.4", features = ["flow-fcs"] }
Or from crates.io (when published):
[dependencies]
peacoqc-rs = { version = "0.2.4", features = ["flow-fcs"] }
Feature Flags
flow-fcs(default): Enable integration with theflow-fcscrate for FCS file supportgpu: Enable GPU acceleration for multi-channel datasets (20-32x speedup for batched operations)cubecl: Enable cubeCL custom GPU kernels (optional, requiresgpufeature)
Quick Start
Basic Usage
use peacoqc_rs::{PeacoQCConfig, PeacoQCData, QCMode, peacoqc};
// Assuming you have an FCS struct that implements PeacoQCData
let config = PeacoQCConfig {
channels: vec!["FL1-A".to_string(), "FL2-A".to_string()],
determine_good_cells: QCMode::All,
..Default::default()
};
let result = peacoqc(&fcs, &config)?;
// Apply the `good_cells` boolean mask from the PeacoQCResult struct
let clean_fcs = fcs.filter(&result.good_cells)?;
println!("Removed {:.2}% of events", result.percentage_removed);
// Export QC results for downstream analysis
result.export_csv_boolean("qc_results.csv")?;
result.export_json_metadata(&config, "qc_metadata.json")?;
See examples/basic_usage.rs for a complete working example.
Interoperability via Traits
PeacoQC-RS uses trait-based design for maximum interoperability. To use PeacoQC with your own FCS data structure, simply implement the PeacoQCData trait:
use peacoqc_rs::{PeacoQCData, Result};
struct MyFcs {
// your data fields
}
impl PeacoQCData for MyFcs {
fn n_events(&self) -> usize {
// return number of events
}
fn channel_names(&self) -> Vec<String> {
// return channel names
}
fn get_channel_range(&self, channel: &str) -> Option<(f64, f64)> {
// return channel range if available
}
fn get_channel_f64(&self, channel: &str) -> Result<Vec<f64>> {
// return channel data as Vec<f64>
}
}
Additionally, implement FcsFilter to enable filtering:
use peacoqc_rs::{FcsFilter, Result};
impl FcsFilter for MyFcs {
fn filter(&self, mask: &[bool]) -> Result<Self> {
// return a new instance with filtered data
}
}
Integration with flow-fcs
If you enable the flow-fcs feature flag, PeacoQC-RS provides trait implementations for the Fcs struct provided by it:
use flow_fcs::Fcs;
use peacoqc_rs::{PeacoQCConfig, QCMode, peacoqc};
let fcs = Fcs::open("data.fcs")?;
let config = PeacoQCConfig {
channels: fcs.get_fluorescence_channels(), // Auto-detect channels
determine_good_cells: QCMode::All,
..Default::default()
};
let result = peacoqc(&fcs, &config)?;
// Apply the `good_cells` boolean mask from the PeacoQCResult struct
let clean_fcs = fcs.filter(&result.good_cells)?;
API Overview
Main Functions
fn peacoqc<T: PeacoQCData>(fcs: &T, config: &PeacoQCConfig) -> Result<PeacoQCResult>
- Main quality control function that runs the complete PeacoQC pipeline
- Processes channels and bins in parallel for optimal performance
fn remove_margins<T: PeacoQCData>(fcs: &T, config: &MarginConfig) -> Result<MarginResult>
- Remove margin events from FCS data
fn remove_doublets<T: PeacoQCData>(fcs: &T, config: &DoubletConfig) -> Result<DoubletResult>
- Detect and remove doublet/multiplet events
Configuration
-
PeacoQCConfig: Main configuration for quality controlchannels: Channels to analyzedetermine_good_cells: QC mode (All, IsolationTree, MAD, None)mad: MAD threshold (default: 6.0)it_limit: Isolation Tree limit (default: 0.6)consecutive_bins: Consecutive bins threshold (default: 5)
-
MarginConfig: Configuration for margin event removal -
DoubletConfig: Configuration for doublet detection
Results
PeacoQCResult: Complete QC resultsgood_cells: Boolean mask (true = keep, false = remove)percentage_removed: Percentage of events removedpeaks: Peak detection results per channeln_bins: Number of bins usedevents_per_bin: Events per binexport_csv_boolean(): Export as boolean CSV (0/1 values)export_csv_numeric(): Export as numeric CSV (2000/6000 values, R-compatible)export_json_metadata(): Export comprehensive QC metrics as JSON
Export Formats
PeacoQC-RS supports multiple export formats for QC results, enabling integration with various downstream analysis tools.
Boolean CSV (Recommended)
Export QC results as a CSV file with 0/1 values:
result.export_csv_boolean("qc_results.csv")?;
Format:
PeacoQC
1
1
0
1
1= good event (keep)0= bad event (remove)
Use cases:
- pandas:
df[df['PeacoQC'] == 1] - R:
df[df$PeacoQC == 1, ] - SQL:
WHERE PeacoQC = 1 - General data analysis workflows
Numeric CSV (R-Compatible)
Export QC results as a CSV file with numeric codes matching the R PeacoQC package:
result.export_csv_numeric("qc_results_r.csv", 2000, 6000)?;
Format:
PeacoQC
2000
2000
6000
2000
2000(or custom good_value) = good event (keep)6000(or custom bad_value) = bad event (remove)
Use cases:
- Compatibility with existing R PeacoQC workflows
- FlowJo CSV import
- Legacy analysis pipelines
JSON Metadata
Export comprehensive QC metrics and configuration as JSON:
result.export_json_metadata(&config, "qc_metadata.json")?;
Format:
{
"n_events_before": 713904,
"n_events_after": 631400,
"n_events_removed": 82504,
"percentage_removed": 11.56,
"it_percentage": 0.0,
"mad_percentage": 11.56,
"consecutive_percentage": 0.0,
"n_bins": 1427,
"events_per_bin": 500,
"channels_analyzed": ["FL1-A", "FL2-A"],
"config": {
"qc_mode": "All",
"mad": 6.0,
"it_limit": 0.6,
"consecutive_bins": 5,
"remove_zeros": false
}
}
Use cases:
- Programmatic access to QC metrics
- Reporting and documentation
- Provenance tracking
- Quality control dashboards
Custom Column Names
You can specify custom column names for CSV exports:
result.export_csv_boolean_with_name("qc_results.csv", "QC_Status")?;
result.export_csv_numeric_with_name("qc_results_r.csv", 2000, 6000, "PeacoQC_Status")?;
Quality Control Methods
1. Peak Detection
Uses kernel density estimation (KDE) with Gaussian kernels to detect peaks in binned data. Peaks are identified using Silverman's rule for bandwidth selection.
2. Isolation Tree
An isolation forest-based outlier detection method. Events in bins with low isolation scores are flagged as outliers.
3. MAD (Median Absolute Deviation)
Detects outliers using the median absolute deviation method. Events exceeding a MAD threshold are flagged.
4. Consecutive Bins Filtering
Removes short consecutive regions that may represent artifacts rather than real biological populations.
5. Monotonic Channel Detection
Detects channels with monotonic trends (increasing or decreasing) which may indicate technical problems:
- Increasing: Possible accumulation, clog developing
- Decreasing: Possible depletion, pressure loss
Uses kernel smoothing (matching R's stats::ksmooth with bandwidth=50) to smooth bin medians, then checks if smoothed values satisfy monotonicity conditions using cummax/cummin. Channels are flagged if >75% of smoothed values are non-decreasing (increasing) or non-increasing (decreasing). This matches the original R implementation's algorithm.
Performance
PeacoQC-RS is optimized for performance:
- Parallel Processing: Uses
rayonfor parallel computation:- Multiple channels processed in parallel (all channels simultaneously)
- Multiple bins within each channel processed in parallel
- Provides significant speedup on multi-core systems (typically 2-8x depending on core count)
- GPU Acceleration (optional,
--features gpu): Provides 20-32x speedup for batched multi-channel operations- Automatically used when GPU is available
- Batched operations amortize GPU overhead across multiple channels
- See
DEV_NOTES.mdfor detailed performance results
- Efficient Data Structures: Uses Polars DataFrames (via
flow-fcsfeature flag) for columnar storage - Minimal Allocations: Optimized to reduce memory allocations
- SIMD Support: Leverages Polars' SIMD operations for fast numeric computations
Benchmarks
Run benchmarks with:
cargo bench --bench peacoqc_bench
Benchmarks are currently being developed and will provide performance metrics for various dataset sizes.
Test Coverage
The library includes comprehensive unit tests covering:
- Peak detection accuracy
- Isolation tree outlier detection
- MAD outlier identification
- Margin event removal
- Doublet detection
- Monotonic channel detection
- Statistical functions (median, MAD, density estimation)
Run tests with:
cargo test
Examples
Basic Usage Example
See examples/basic_usage.rs for a complete example demonstrating:
- Creating synthetic FCS data
- Removing margin events
- Removing doublets
- Running full PeacoQC analysis
- Applying the quality control filter
Run with:
cargo run --example basic_usage
Error Handling
All functions return Result<T, PeacoQCError>. The PeacoQCError enum covers:
InvalidChannel: Invalid or non-numeric channelChannelNotFound: Channel not found in dataInsufficientData: Not enough events for analysisStatsError: Statistical computation failedConfigError: Configuration errorNoPeaksDetected: No peaks detected in dataPolarsError: Polars DataFrame error (when using flow-fcs feature)
License
MIT License - see LICENSE file for details
Attribution
This Rust implementation is based on the original PeacoQC algorithm and R package. We gratefully acknowledge the original authors:
Original Paper:
Original R Implementation:
- GitHub:
https://github.com/saeyslab/PeacoQC - Authors: Annelies Emmaneel, Katrien Quintelier, and the Saeys Lab
This Rust version provides:
- Improved performance through native compilation
- Better memory efficiency
- Type safety
- Trait-based extensibility
Contributing
Contributions are welcome! Please feel free to open issues or submit a Pull Request on Github.
Dependencies
~65–115MB
~2M SLoC