4 releases (stable)
new 1.2.0 | Dec 4, 2024 |
---|---|
1.1.0 | Oct 2, 2024 |
1.0.0 | Oct 2, 2024 |
0.1.0 | Sep 17, 2024 |
#284 in Algorithms
315 downloads per month
Used in 3 crates
59KB
924 lines
rapidhash - rust implementation
A rust implementation of the rapidhash function, the official successor to wyhash.
- High quality, the fastest hash passing all tests in the SMHasher and SMHasher3 benchmark. Collision-based study showed a collision probability lower than wyhash and close to ideal.
- Very fast, the fastest passing hash in SMHasher3. Significant throughput improvement over wyhash. Fastest memory-safe hash. Fastest platform-independent hash. Fastest const hash.
- Platform independent, works on all platforms, no dependency on machine-specific vectorized or cryptographic hardware instructions. Optimised for both AMD64 and AArch64.
- Memory safe, when the
unsafe
feature is disabled (default). This implementation has also been fuzz-tested withcargo fuzz
. - No dependencies and no-std compatible when disabling the
std
feature. - Official successor to wyhash, with improved speed, quality, and compatibility.
- Inline variants that use
#[inline(always)]
onRapidInlineHash
andRapidInlineHashBuilder
to force compiler optimisations on specific input types (can double the hash performance depending on the hashed type). - Run-time and compile-time hashing as the hash implementation is fully
const
. - Idiomatic
std::hash::Hasher
compatible hasher forHashMap
andHashSet
usage. - Non-cryptographic hash function.
Usage
Hashing
use std::hash::Hasher;
use rapidhash::{rapidhash, RapidInlineHasher, RapidHasher};
// direct const usage
assert_eq!(rapidhash(b"hello world"), 17498481775468162579);
// a std::hash::Hasher compatible hasher
let mut hasher = RapidInlineHasher::default();
hasher.write(b"hello world");
assert_eq!(hasher.finish(), 17498481775468162579);
// a non-inline hasher for when you don't want to force inlining,
// such as when being careful with WASM binary size.
let mut hasher = RapidHasher::default();
hasher.write(b"hello world");
assert_eq!(hasher.finish(), 17498481775468162579);
// a const API similar to std::hash::Hasher
const HASH: u64 = RapidInlineHasher::default_const()
.write_const(b"hello world")
.finish_const();
assert_eq!(HASH, 17498481775468162579);
Helper Types
// also includes HashSet equivalents
use rapidhash::{RapidHashMap, RapidInlineHashMap};
// std HashMap with the RapidHashBuilder hasher.
let mut map = RapidHashMap::default();
map.insert("hello", "world");
// a hash map type using the RapidInlineHashBuilder to force the compiler to
// inline the hash function for further optimisations (can be over 30% faster).
let mut map = RapidInlineHashMap::default();
map.insert("hello", "world");
CLI
Rapidhash can also be installed as a CLI tool to hash files or stdin. This is not a cryptographic hash, but should be much faster than cryptographic hashes.
Output is the decimal string of the u64
hash value.
# install
cargo install rapidhash
# hash stdin (output: 8543579700415218186)
echo "example" | rapidhash
# hash a file (output: 8543579700415218186)
rapidhash example.txt
Features
default
:std
std
: Enables theRapidHashMap
andRapidHashSet
helper types.rand
: EnablesRapidRandomState
, aBuildHasher
that randomly initializes the seed. Includes therand
crate dependency.rng
: EnablesRapidRng
, a fast, non-cryptographic random number generator based on rapidhash. Includes therand_core
crate dependency.unsafe
: Uses unsafe pointer arithmetic to skip some unnecessary bounds checks for a small 3-4% performance improvement.
How to choose your hash function
Hash functions are not a one-size fits all. Benchmark your use case to find the best hash function for your needs, but here are some general guidelines on choosing a hash function:
default
: Use the std lib hasher when hashing is not in the critical path, or if you need strong HashDoS resistance.rapidhash
: You are hashing complex objects or byte streams, need compile-time hashing, or a performant high-quality hash. Benchmark theRapidInline
variants if you need the utmost performance.fxhash
: You are hashing integers, or structs of only integers, and the lower quality hash doesn't affect your use case.gxhash
: You are hashing long byte streams on platforms with the necessary instruction sets and only care about throughput. You don't need memory safety, HashDoS resistance, or platform independence (for example, gxhash doesn't currently compile on Github Actions workflows).
Benchmarks
Initial benchmarks on M1 Max (aarch64) for various input sizes.
Hashing Benchmarks
There are two types of benchmarks over the different algorithms to cover various forms of compiler optimisation that Rust can achieve:
str_len
: hashing bytes (a string) of the given length, where the length is not known at compile time.u64
: hashing a u64, 8 bytes of known size, where the compiler can optimise the path.
Note on wyhash: hashing throughput doesn't translate to hashmap insertion throughput, see the hashmap insertion benchmarks below.
HashMap Insertion Benchmarks
Hash speed and throughput can be a poor measure in isolation, as it doesn't take into account hash quality. More hash collisions can cause slower hashmap insertion, and so hashmap insertion benchmarks can be a better measure of hash performance. As always, benchmark your use case.
Versioning
The minimum supported Rust version (MSRV) is 1.77.0.
The rapidhash crate follows the following versioning scheme:
- Major for breaking changes, such as hash output changes, breaking API changes, MSRV version bumps. When the RNG code is stabilised, major version bumps to
rand_core
will also trigger a major version bump of rapidhash due to the re-exported trait implementations. - Minor for significant API additions/deprecations.
- Patch for bug fixes and performance improvements.
License and Acknowledgements
This project is licensed under both the MIT and Apache-2.0 licenses. You are free to choose either license.
With thanks to Nicolas De Carli for the original rapidhash C++ implementation, which is licensed under the BSD 2-Clause license.
With thanks to Justin Bradford for letting us use the rapidhash crate name 🍻
Dependencies
~80KB