#elias-fano #compression #sequence

cacheline-ef

Per-cacheline encoding of sorted integer sequences

1 stable release

new 1.0.0 Mar 28, 2025

#483 in Science


Used in ptr_hash

MIT license

11KB
144 lines

Cacheline Elias-Fano

crates.io docs.rs

CachelineEf is an integer encoding that packs chunks of 44 sorted 40-bit values into a single cacheline, using 64/44*8 = 11.6 bits per value. Each chunk can hold increasing values in a range of length 256*84=21504.

CachelineEfVec stores a vector of CachelineEf and provides CachelineEfVec::index and CachelineEfVec::prefetch.

This encoding is efficient when consecutive values differ by roughly 100, where using Elias-Fano directly on the full list would use around 9 bits/value.

The main benefit is that CachelineEf only requires reading a single cacheline per query, where Elias-Fano encoding usually needs 3 reads.

Epserde is supported when the epserde feature flag is enabled.

Layout

The layout is described in detail in the PtrHash paper (arXiv, blog version).

In summary:

  • First store a 4 byte offset, corresponding to the 32 high bits of the smallest value.
  • Then store for each of 44 values the 8 low bits.
  • Lastly we have 16 bytes (128 bits) to encode the high parts. For the i'th value x[i], we set the bit at position i+(x[i]/256 - x[0]/256) to 1.

Dependencies

~0.9–1.4MB
~30K SLoC