#histogram #label #memory #parameters #standard-histogram #compressed-histogram

c2_histograms

Tools for histogram compression and usage

6 releases

0.2.4 Jul 31, 2020
0.2.3 Jul 31, 2020
0.1.0 Jul 24, 2020

#486 in Compression

GPL-2.0-or-later

47KB
642 lines

C-Squared Histograms Aaron Gindi

This library is designed to facilitate user control over the trade-off between precision and memory usage of histograms, which are often essential in memory research.

Instead of implementing memory saving operations, this library merely simulates the operations for evaluation purposes.

Patch 0.2.0: Fix bugs relating to CompressedHistogram and made CompressedHistogram more efficient. Breaking change because Copy trait was removed from Param implementors, and C2Params now return references to embedded parameters instead of copies.


lib.rs:

This library contains tools for simulating space efficient histograms. For the purposes of this library, a histogram is a map from labels to frequencies, both of which must be numerical.

We provide is a standard histogram implementation (StandardHistogram) which stores label-frequency pairs at face value using a HashMap.

Furthermore, we provide three additional histogram implementations that use either strictly less memory than StandardHistogram or use a fixed amount of memory. The optimizations are done across two dimensions, the label storage (referred to as "compaction") and the frequency storage (referred to as "compression"). Hence the name "C-Squared Histograms". The three implementations are as follows:

  • CompressedHistogram - This implementation uses strictly less space than a StandardHistogram by approximating the frequencies
  • CompactHistogram - This implementation consumes a fixed amount of space depending on a few precision parameters. It saves space by approximating labels.
  • C2Histogram - This implementation also utilizes a fixed amount of space, usually much less than a CompactHistogram. It approximates both labels and frequencies.

Each of these implementations is parameterized such that the trade-off between precision and memory performance is directly controlled by the user.

To construct a StandardHistogram, use the function create_standard_histogram. Then, the other histograms can be derived from that standard histogram using the conversion functions to_compact, to_compressed, or to_c2.

No runtime deps