#byte #integer-value #codec #tags #64-bit #length #format

streamvbyte64

Implementation of stream-vbyte codec supporting 64-bit wide values

3 unstable releases

0.2.0 Jul 26, 2023
0.1.1 Jul 4, 2023
0.1.0 Jul 4, 2023

#1104 in Algorithms

MIT/Apache

125KB
2.5K SLoC

Fast Byte-Aligned Integer Coding

This crate is a Rust port of Daniel Lemire's streamvbyte library. It contains multiple implementations of this format aimed at different integer sizes and value distributions.

Each Coder implementation produces different formats that are incompatible with one another. Names provide the length of each of the 4 possible tags for each value so Coder1234 encodes each entry as 1, 2, 3, or 4 bytes. A scalar implementation is always available at a large speed penalty but the implementation will automatically use an accelerated implementation for the target if available.

Performance

A scalar implementation is available for all Coders but this is typically pretty slow. All implementations also include acceleration when building for little-endian aarch64 with NEON instruction support and x86_64 using the SSSE3 or SSE4.1 instruction sets.

Coder1234 will typically be fastest, but other tag length distributions (including Coder1248 for 64-bit values) are available.)

Benchmark numbers were collected on M1 Macbook Air, to generate your own run

cargo bench --bench=streamvbyte

Each benchmark is run with 1024 elements of up to a certain length using a zipf-like distribution.

encode and decode benchmarks measure value coding throughput, deltas variants measure the same assuming monotonically increasing inputs and compressing deltas between values. data_len measures throughput for determining data length based on tag values, skip_deltas does the same for delta coded streams and includes the sum of all values skipped.

Coder1234

Benchmark Throughput
encode/1 4.8Gelem/s
encode_deltas/1 4.1Gelem/s
decode/1 12.7Gelem/s
decode_deltas/1 5.6Gelem/s
skip_deltas/1 19.9Gelem/s
data_len/1 54.2Gelem/s
encode/2 4.8Gelem/s
encode_deltas/2 3.8Gelem/s
decode/2 8.2Gelem/s
decode_deltas/2 4.2Gelem/s
skip_deltas/2 8.5Gelem/s
encode/4 4.8Gelem/s
encode_deltas/4 4.1Gelem/s
decode/4 8.2Gelem/s
decode_deltas/4 4.2Gelem/s
skip_deltas/4 8.4Gelem/s

Coder0124

Benchmark Throughput
encode/1 4.2Gelem/s
encode_deltas/1 3.6Gelem/s
decode/1 7.2Gelem/s
decode_deltas/1 4.2Gelem/s
skip_deltas/1 7.1Gelem/s
data_len/1 53.8Gelem/s
encode/2 4.2Gelem/s
encode_deltas/2 3.6Gelem/s
decode/2 7.2Gelem/s
decode_deltas/2 4.2Gelem/s
skip_deltas/2 7.3Gelem/s
encode/4 4.2Gelem/s
encode_deltas/4 3.6Gelem/s
decode/4 7.2Gelem/s
decode_deltas/4 4.2Gelem/s
skip_deltas/4 7.5Gelem/s

Coder1248

Benchmark Throughput
encode/1 3.1Gelem/s
encode_deltas/1 2.5Gelem/s
decode/1 4.6Gelem/s
decode_deltas/1 3.6Gelem/s
skip_deltas/1 6.0Gelem/s
data_len/1 53.8Gelem/s
encode/4 3.1Gelem/s
encode_deltas/4 2.5Gelem/s
decode/4 4.8Gelem/s
decode_deltas/4 3.6Gelem/s
skip_deltas/4 5.7Gelem/s
encode/8 3.0Gelem/s
encode_deltas/8 2.5Gelem/s
decode/8 4.8Gelem/s
decode_deltas/8 3.0Gelem/s
skip_deltas/8 5.4Gelem/s

Dependencies

~160KB