#varint #numbers #simd #stream #stream-processing

stream-vbyte

Compress and decompress numbers efficiently in the Stream VByte encoding

7 unstable releases

0.4.1 May 23, 2023
0.4.0 Nov 27, 2021
0.3.2 Oct 12, 2017
0.2.0 Oct 4, 2017
0.1.0 Sep 30, 2017

#79 in Compression

Download history 1513/week @ 2024-06-19 809/week @ 2024-06-26 1394/week @ 2024-07-03 1175/week @ 2024-07-10 1159/week @ 2024-07-17 2719/week @ 2024-07-24 2563/week @ 2024-07-31 2758/week @ 2024-08-07 1700/week @ 2024-08-14 1687/week @ 2024-08-21 2481/week @ 2024-08-28 2061/week @ 2024-09-04 2391/week @ 2024-09-11 1613/week @ 2024-09-18 2003/week @ 2024-09-25 1320/week @ 2024-10-02

7,677 downloads per month
Used in granne

Custom license

155KB
2K SLoC

Build Status

A port of Stream VByte to Rust.

Stream VByte is a variable-length unsigned int encoding designed to make SIMD processing more efficient.

See https://lemire.me/blog/2017/09/27/stream-vbyte-breaking-new-speed-records-for-integer-compression/ and https://arxiv.org/pdf/1709.08990.pdf for details on the format. The reference C implementation is https://github.com/lemire/streamvbyte.

Usage

See the documentation.

Play with the CLI example

There's a cli.rs example provided that demonstrates encoding and decoding.

To encode some numbers, provide numbers (one per line) to stdin, and the encoded result will be written to stdout.

Example using jot to produce the numbes 1 to 100: jot 100 | cargo run --example cli -- enc | base64

Output, with cargo build output removed (the "Encoded ..." is on stderr for human convenience):

Encoded 100 numbers
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAECAwQFBgcICQoLDA0ODxAREhMUFRYXGBkaGxwdHh8g
ISIjJCUmJygpKissLS4vMDEyMzQ1Njc4OTo7PD0+P0BBQkNERUZHSElKS0xNTk9QUVJTVFVWV1hZ
WltcXV5fYGFiY2Q=

There's a corresponding decode mode that reads the encoded format on stdin and emits the contents, one number per line. Here, we encode some numbers then decode them again: jot 10 | cargo run --example cli -- enc | cargo run --example cli -- dec -c 10

Encoded 10 numbers
1
2
3
4
5
6
7
8
9
10
Decoded 10 numbers

Maintainers

To generate the lookup tables:

cargo run --example generate_decode_table > tmp/tables.rs && mv tmp/tables.rs src/tables.rs

To run the tests (on recent Intel):

RUSTFLAGS='-C target-feature=+ssse3,+sse4.1' cargo +nightly test --all-features

To run the benchmarks:

RUSTFLAGS='-C target-feature=+ssse3,+sse4.1' cargo +nightly bench --all-features

No runtime deps