#duplicates #hash-table #detection #algorithm #data-structures

bin+lib qht

Implementation of Quotient Hash Tables variants QHTc, QQHTc, QQHTDc

1 unstable release

0.1.0 Apr 7, 2019

#47 in #duplicate

MIT license

25KB
304 lines

qht

Latest version Documentation Build Status Long time support rustc version License

A Rust implementation of Quotient Hash Tables, a reasonably efficient approximate duplicate detection algorithm. See paper here.

How to use

extern crate qht;
extern crate rand;

use qht::*;
use rand::rngs::StdRng;
use rand::{FromEntropy, Rng};

fn main() {
    // Creates a new quotient hash table with 1 bucket and a fingerpint size of 3
    let mut f = QuotientHashTable::new(1024, 8, 3);
    // Initialize PRNG
    let mut rng = StdRng::from_entropy();

    let mut measured_collisions = 0;
    let mut actual_collisions = 0;
    let mut elements : Vec<u64> = Vec::with_capacity(1000);

    for _ in 0..10000 {
        // We generate a random element
        let element = rng.gen_range(0, 1000);

        // Check if it's in the Hash Table
        if f.lookup(element) {
            measured_collisions+=1;
        } else {
            // Insert it if it's not
            f.insert(element);
        }

        // Check if the element has been generated
        if elements.contains(&element) {
            actual_collisions+=1;
        } else {
            // Record it if it's not
            elements.push(element);
        }
    }

    println!("Measured Collisions {}, Actual Collisions {} ", measured_collisions, actual_collisions);
}

Running the tests

Public functions are tested in their documentation. Other miscellaneous tests are written in lib.rs. Run the tests with

cargo test

Running the benchmarks

The Criterion dependency is used to provide precise benchmarkings. Benchmarks can be run with

cargo bench

Documentation

Generate the documentation with

cargo doc --no-deps

License

This project is licensed under the MIT License - see the LICENSE file for details

Dependencies

~0.6–0.9MB
~12K SLoC