1 unstable release

0.1.0 Jun 9, 2023

#670 in Science

MIT/Apache

23KB
308 lines

Hstats: Online Statistics and Histograms for Data Streams

GitHub Workflow Status GitHub tag Crates.io Docs.rs

hstats is a streamlined and high-performance library engineered for online statistical analysis and histogram generation from data streams. With a focus on multi-threaded environments, hstats facilitates parallel operations that can later be merged into a single Hstats instance.

During the histogram creation process, the number and width of bins are predetermined. The bin width is calculated using the formula (end - start)/nbins, based on the parameters provided by the user. Values that fall within the range of [start, end) are assigned to the appropriate bins, while values outside this range are counted in underflow and overflow bins, which allows for subsequent adjustments to the histogram's range.

hstats utilizes Welford's algorithm via the rolling-stats library to compute mean and standard deviation statistics. The hstats library is compatible with no_std environments that support alloc.

To simplify the output of statistics and histograms, hstats implements the Display trait for Hstats. This allows users to define the floating-point precision (default is 2) for the printed statistics and choose the character used for the histogram bars (default is ).

Getting Started

Add the following to your Cargo.toml:

[dependencies]
hstats = "0.1.0"

Examples

  1. Single thread example: See examples/single-thread.rs Run the example with:
    time cargo run --example single-thread --release
    
  2. Multi-thread example: See examples/multi-thread.rs Run the example with:
    time cargo run --example multi-thread --release
    

Here is a sample output from the multi-thread example:

Number of random samples: 50000000
Number of bins: 30
Start: -8
End: 10
Thread count: 20
Chunk size: 2500000
Number of hstats to merge: 20
Start |  End
------|-------
 -inf | -8.00 |  21553 (0.04%)
-8.00 | -7.40 |  21752 (0.04%)
-7.40 | -6.80 |  40523 (0.08%)
-6.80 | -6.20 |73078 (0.15%)
-6.20 | -5.60 |125206 (0.25%)
-5.60 | -5.00 | ░░░ 207593 (0.42%)
-5.00 | -4.40 | ░░░░ 331470 (0.66%)
-4.40 | -3.80 | ░░░░░░░ 508330 (1.02%)
-3.80 | -3.20 | ░░░░░░░░░░░ 745836 (1.49%)
-3.20 | -2.60 | ░░░░░░░░░░░░░░░ 1054228 (2.11%)
-2.60 | -2.00 | ░░░░░░░░░░░░░░░░░░░░░ 1433304 (2.87%)
-2.00 | -1.40 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 1868758 (3.74%)
-1.40 | -0.80 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2339425 (4.68%)
-0.80 | -0.20 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2819448 (5.64%)
-0.20 |  0.40 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3261165 (6.52%)
 0.40 |  1.00 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3623683 (7.25%)
 1.00 |  1.60 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3875841 (7.75%)
 1.60 |  2.20 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3980760 (7.96%)
 2.20 |  2.80 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3928130 (7.86%)
 2.80 |  3.40 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3725868 (7.45%)
 3.40 |  4.00 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 3393196 (6.79%)
 4.00 |  4.60 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2971132 (5.94%)
 4.60 |  5.20 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2497847 (5.00%)
 5.20 |  5.80 | ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2022084 (4.04%)
 5.80 |  6.40 | ░░░░░░░░░░░░░░░░░░░░░░░ 1569143 (3.14%)
 6.40 |  7.00 | ░░░░░░░░░░░░░░░░░ 1171523 (2.34%)
 7.00 |  7.60 | ░░░░░░░░░░░░ 841536 (1.68%)
 7.60 |  8.20 | ░░░░░░░░ 579675 (1.16%)
 8.20 |  8.80 | ░░░░░ 383658 (0.77%)
 8.80 |  9.40 | ░░░ 243884 (0.49%)
 9.40 | 10.00 | ░░ 148977 (0.30%)
10.00 |   inf | ░░ 191394 (0.38%)

Total Count: 50000000 Min: -14.19 Max: 18.04 Mean: 2.00 Std Dev: 3.00


real    0m1.905s
user    0m9.727s
sys     0m0.127s

License

hstats is licensed under your choice of either the Apache License, Version 2.0, or the MIT license.

Dependencies

~565KB
~11K SLoC