#hamming #avx2 #popcount

hamming_rs

Computes Hamming distance and weight -- if available and beneficial, uses a highly optimized avx2 implementation

9 releases

new 0.2.24 Mar 18, 2025
0.2.23 Mar 17, 2025
0.2.22 Feb 3, 2024
0.2.21 Apr 20, 2022
0.1.1 Apr 16, 2022

#4 in #avx2

Download history 1/week @ 2024-11-27 64/week @ 2024-12-04 78/week @ 2024-12-11 15/week @ 2024-12-18 4/week @ 2024-12-25 9/week @ 2025-01-01 19/week @ 2025-01-08 2/week @ 2025-02-12 160/week @ 2025-03-12

160 downloads per month
Used in hamming-bitwise-fast

MIT license

1.5MB
585 lines

Contains (WOFF font, 400KB) NanumBarunGothic-13b3dcba.ttf.woff2, (WOFF font, 135KB) FiraSans-Medium-e1aa3f0a.woff2, (WOFF font, 130KB) FiraSans-Regular-0fe48ade.woff2, (WOFF font, 82KB) SourceSerif4-Bold-6d4fd4c0.ttf.woff2, (WOFF font, 77KB) SourceSerif4-Regular-6b053e98.ttf.woff2, (WOFF font, 45KB) SourceCodePro-It-fc8b9304.ttf.woff2 and 3 more.

hamming_rs

Computes hamming distance and weight, possibly with avx/avx2 instructions for x86 processors
avx2 optimized version is used when inputs have same memory alignment and are large enough otherwise, functions will fallback to slower versions but still faster than strsim or hamming crates.

references

Muła, Wojciech, Nathan Kurz, and Daniel Lemire. "Faster population counts using AVX2 instructions." The Computer Journal 61.1 (2018): 111-120.

https://arxiv.org/pdf/1611.07612.pdf

Thanks to @emschwartz (github), his fast implementation is now used when avx2 is not available. It allows auto-vectorization on non intel platforms.

performances

We ran benchmarks with cargo bench on a laptop with a AMD Ryzen 5 PRO 5675U @4.4Ghz with 32Gb of RAM. Before that, we set export RUSTFLAGS="-C target-feature=+avx2,+fma". We compare ourselves against hamming and strsim reference crates. We notice a speedup of ~3 against hamming and of ~30 against strsim

hamming_rs vs hamming and strsim speedup over hamming and strsim

Dependencies