#statistics #stream

watermill

Blazingly fast, generic, and serializable online statistics

2 releases

0.1.1 Feb 6, 2023
0.1.0 Sep 12, 2022

#1069 in Algorithms

Download history 286/week @ 2023-10-29 275/week @ 2023-11-05 138/week @ 2023-11-12 107/week @ 2023-11-19 234/week @ 2023-11-26 341/week @ 2023-12-03 134/week @ 2023-12-10 206/week @ 2023-12-17 502/week @ 2023-12-24 612/week @ 2023-12-31 204/week @ 2024-01-07 265/week @ 2024-01-14 530/week @ 2024-01-21 813/week @ 2024-01-28 538/week @ 2024-02-04 416/week @ 2024-02-11

2,341 downloads per month
Used in simple_accumulator

MIT license

74KB
1.5K SLoC

Online statistics in Rust 🦀

watermill is crate 🦀 for Blazingly fast, generic and serializable online statistics.

Quickstart


Let's compute the online median and then serialize it:

use watermill::quantile::Quantile;
use watermill::stats::Univariate;
let data: Vec<f64> = vec![9., 7., 3., 2., 6., 1., 8., 5., 4.];
let mut running_median: Quantile<f64> = Quantile::new(0.5_f64).unwrap();
for x in data.into_iter() {
    running_median.update(x); // update the current statistics
    println!("The actual median value is: {}", running_median.get());
}
assert_eq!(running_median.get(), 5.0);

// Convert the statistic to a JSON string.
let serialized = serde_json::to_string(&running_median).unwrap();

// Convert the JSON string back to a statistic.
let deserialized: Quantile<f64> = serde_json::from_str(&serialized).unwrap();

Now let's compute the online sum using the iterators:

use watermill::iter::IterStatisticsExtend;
let data: Vec<f64> = vec![1., 2., 3.];
let vec_true: Vec<f64> = vec![1., 3., 6.];
for (d, t) in data.into_iter().online_sum().zip(vec_true.into_iter()) {
    assert_eq!(d, t); //       ^^^^^^^^^^
}

You can also compute rolling statistics; in the following example let's compute the rolling sum on 2 previous data:


use watermill::rolling::Rolling;
use watermill::stats::Univariate;
use watermill::variance::Variance;
let data: Vec<f64> = vec![9., 7., 3., 2., 6., 1., 8., 5., 4.];
let mut running_var: Variance<f64> = Variance::default();
// We wrap `running_var` inside the `Rolling` struct.
let mut rolling_var: Rolling<f64> = Rolling::new(&mut running_var, 2).unwrap();
for x in data.into_iter() {
    rolling_var.update(x);
}
assert_eq!(rolling_var.get(), 0.5);

Installation


Add the following line to your cargo.toml:

[dependencies]
watermill = "0.1.0"

Statistics available

Statistics Rollable ?
Mean
Variance
Sum
Min
Max
Count
Quantile
Peak to peak
Exponentially weighted mean
Exponentially weighted variance
Interquartile range
Kurtosis
Skewness
Covariance

Inspiration


The stats module of the river library in Python greatly inspired this crate.

Dependencies

~1.4–2.3MB
~50K SLoC