#histogram #key #data #hash #mode #count #hashable

hash_histogram

HashHistogram creates histograms with keys of any hashable data type. Features include rank ordering and mode.

11 releases

new 0.9.2 Dec 14, 2024
0.9.1 Dec 11, 2024
0.8.0 Dec 7, 2023
0.7.0 Dec 15, 2022
0.5.2 Dec 4, 2021

#722 in Data structures

Download history 8/week @ 2024-09-20 1/week @ 2024-09-27 265/week @ 2024-12-06

265 downloads per month

MIT/Apache

12KB
145 lines

Overview

HashHistogram creates histograms with keys of any hashable data type. Features include:

  • Check histogram count for a key.
  • Check total histogram counts across all keys.
  • Provide all keys in descending ranked order.
  • Find the mode of the histogram (i.e., an item with the largest number of counts)
  • Find the mode of any IntoIterator type, bulding a HashHistogram as an intermediate step.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.


lib.rs:

Overview

This library provides struct HashHistogram. It wraps HashMap to provide a straightforward histogram facility.

use hash_histogram::HashHistogram;

// Record and inspect histogram counts.

let mut h = HashHistogram::new();
for s in ["a", "b", "a", "b", "c", "b", "a", "b"].iter() {
    h.bump(s);
}

for (s, c) in [("a", 3), ("b", 4), ("c", 1), ("d", 0)].iter() {
    assert_eq!(h.count(s), *c);
}

assert_eq!(h.total_count(), 8);

// Iteration
let mut iterated: Vec<(&str,usize)> = h.iter().map(|(s,c)| (*s, *c)).collect();
iterated.sort();
assert_eq!(iterated, vec![("a", 3), ("b", 4), ("c", 1)]);

// Iterating over counts only
let mut counts: Vec<usize> = h.counts().collect();
counts.sort();
assert_eq!(counts, vec![1, 3, 4]);

// Ranked ordering
assert_eq!(h.ranking(), vec!["b", "a", "c"]);

// Ranked ordering with counts
assert_eq!(h.ranking_with_counts(), vec![("b", 4), ("a", 3), ("c", 1)]);

// Mode
assert_eq!(h.mode(), Some("b"));

// Incrementing larger counts
for (s, count) in [("a", 2), ("b", 3), ("c", 10), ("d", 5)].iter() {
    h.bump_by(s, *count);
}

for (s, count) in [("a", 5), ("b", 7), ("c", 11), ("d", 5)].iter() {
    assert_eq!(h.count(s), *count);
}

Calculating the mode is sufficiently useful on its own that the mode() and mode_values() functions are provided. Use mode() with iterators containing references to values in containers, and mode_values() for iterators that own the values they return.

They each use a HashHistogram to calculate a mode from an object of any type that has the IntoIterator trait:

use hash_histogram::{mode, mode_values};
let chars = vec!["a", "b", "c", "d", "a", "b", "a"];

// Directly passing the container.
assert_eq!(mode(&chars).unwrap(), "a");

// Passing an iterator from the container.
assert_eq!(mode(chars.iter()).unwrap(), "a");

// Use mode_values() when using an iterator generating values in place.
let nums = vec![100, 200, 100, 200, 300, 200, 100, 200];
assert_eq!(mode_values(nums.iter().map(|n| n + 1)).unwrap(), 201);

HashHistogram supports common Rust data structure operations. It implements the FromIterator and Extend traits, and derives serde:

use hash_histogram::HashHistogram;

// Initialization from an iterator:
let mut h: HashHistogram<isize> = [100, 200, 100, 200, 300, 200, 100, 200].iter().collect();

// Extension from an iterator
h.extend([200, 400, 200, 500, 200].iter());

// Serialization
let serialized = serde_json::to_string(&h).unwrap();

// Deserialization
let deserialized: HashHistogram<isize> = serde_json::from_str(&serialized).unwrap();
assert_eq!(deserialized, h);

Dependencies

~2.1–3MB
~65K SLoC