#perfect-hash #hashing #perfect #mphf #dictionary #map #data-structures

ph

The library of data structures based on perfect hashing

Show the crate…

20 releases

0.8.5 Oct 22, 2024
0.8.3 Feb 24, 2024
0.8.2 Dec 26, 2023
0.8.1 Oct 26, 2023
0.1.0 Jul 13, 2022

#84 in Algorithms

Download history 4746/week @ 2024-09-18 4313/week @ 2024-09-25 4573/week @ 2024-10-02 4226/week @ 2024-10-09 5129/week @ 2024-10-16 4684/week @ 2024-10-23 4826/week @ 2024-10-30 5552/week @ 2024-11-06 5117/week @ 2024-11-13 4201/week @ 2024-11-20 3760/week @ 2024-11-27 4257/week @ 2024-12-04 3914/week @ 2024-12-11 2342/week @ 2024-12-18 1033/week @ 2024-12-25 2497/week @ 2025-01-01

10,604 downloads per month
Used in 9 crates (7 directly)

MIT/Apache

320KB
4.5K SLoC

ph is the Rust library (by Piotr Beling) of data structures based on perfect hashing.

The library contains an implementation of two variants of the fingerprint-based minimal perfect hash function: without (FMPH, fmph::Function) and with (FMPHGO, fmph::GOFunction) group optimization. A minimal perfect hash function (MPHF) is a bijection from a key set K to the set {0, 1, ..., |K|−1}.

FMPH and FMPHGO can be constructed for any set K (given in advance) of hashable items and represented using about 2.8 and 2.1 bits per key (regardless of key types), respectively. FMPH and FMPHGO are fast (O(1) in expectation) to evaluate. Their construction requires very little auxiliary space, takes a short (O(|K|) in expectation) time (which is especially true for FMPH) and, in addition, can be parallelized or carried out without holding keys in memory.

Bibliography

When using ph for research purposes, please cite the following paper which provides details on FMPH and FMPHGO:

Examples

The following examples illustrate the use of fmph::Function, which, however, can be replaced with fmph::GOFunction without any other changes.

A basic example:

use ph::fmph;

let keys = ['a', 'b', 'z'];
let f = fmph::Function::from(keys.as_ref());
// f assigns each key a unique number from the set {0, 1, 2}
for k in keys { println!("The key {} is assigned the value {}.", k, f.get(&k).unwrap()); }
let mut values = [f.get(&'a').unwrap(), f.get(&'b').unwrap(), f.get(&'z').unwrap()];
values.sort();
assert_eq!(values, [0, 1, 2]);

An example of using fmph::Function and bitmap to represent subsets of a given set of hashable elements:

use ph::fmph;
use bitm::{BitAccess, BitVec};  // bitm is used to manipulate bitmaps
use std::hash::Hash;

pub struct Subset { // represents a subset of the given set
    hash: fmph::Function, // bijectively maps elements of the set to bits of bitmap
    bitmap: Box<[u64]> // the bit pointed by the hash for e is 1 <=> e is in the subset
}

impl Subset {
    pub fn of<E: Hash + Sync>(set: &[E]) -> Self { // constructs empty subset of the given set
        Subset {
            hash: set.into(),
            bitmap: Box::with_zeroed_bits(set.len())
        }
    }

    pub fn contain<E: Hash>(&self, e: &E) -> bool { // checks if e is in the subset
        self.bitmap.get_bit(self.hash.get_or_panic(e) as usize) as bool
    }

    pub fn insert<E: Hash>(&mut self, e: &E) { // inserts e into the subset
        self.bitmap.set_bit(self.hash.get_or_panic(e) as usize)
    }

    pub fn remove<E: Hash>(&mut self, e: &E) { // removes e from the subset
        self.bitmap.clear_bit(self.hash.get_or_panic(e) as usize)
    }

    pub fn len(&self) -> usize { // returns the number of elements in the subset
        self.bitmap.count_bit_ones()
    }
}

let mut subset = Subset::of(["alpha", "beta", "gamma"].as_ref());
assert_eq!(subset.len(), 0);
assert!(!subset.contain(&"alpha"));
assert!(!subset.contain(&"beta"));
subset.insert(&"beta");
subset.insert(&"gamma");
assert_eq!(subset.len(), 2);
assert!(subset.contain(&"beta"));
subset.remove(&"beta");
assert_eq!(subset.len(), 1);
assert!(!subset.contain(&"beta"));
// subset.insert(&"zeta"); // may either panic or insert any item into subset

Above Subset is an example of an updatable retrieval data structure with a 1-bit payload. It can be generalized by replacing the bitmap with a vector of other payload.

Dependencies

~1.5MB
~29K SLoC