#compression #numerical #quantile #delta

q_compress

Good compression for numerical sequences and time series

23 releases (10 breaking)

0.11.1 Jul 24, 2022
0.10.2 Jul 6, 2022
0.9.1 Mar 15, 2022
0.4.0 Dec 26, 2021
0.2.2 Jul 19, 2021

#30 in Compression

Download history 38/week @ 2022-04-22 126/week @ 2022-04-29 123/week @ 2022-05-06 301/week @ 2022-05-13 96/week @ 2022-05-20 108/week @ 2022-05-27 152/week @ 2022-06-03 36/week @ 2022-06-10 23/week @ 2022-06-17 15/week @ 2022-06-24 67/week @ 2022-07-01 64/week @ 2022-07-08 80/week @ 2022-07-15 107/week @ 2022-07-22 25/week @ 2022-07-29 23/week @ 2022-08-05

236 downloads per month
Used in 3 crates (2 directly)

Apache-2.0

175KB
4.5K SLoC

Crates.io

q_compress

Usage

use q_compress::{auto_compress, auto_decompress, DEFAULT_COMPRESSION_LEVEL};

fn main() {
  // your data
  let mut my_ints = Vec::new();
  for i in 0..100000 {
    my_ints.push(i as i64);
  }
 
  // Here we let the library choose a configuration with default compression
  // level. If you know about the data you're compressing, you can compress
  // faster by creating a `CompressorConfig`.
  let bytes: Vec<u8> = auto_compress(&my_ints, DEFAULT_COMPRESSION_LEVEL);
  println!("compressed down to {} bytes", bytes.len());
 
  // decompress
  let recovered = auto_decompress::<i64>(&bytes).expect("failed to decompress");
  println!("got back {} ints from {} to {}", recovered.len(), recovered[0], recovered.last().unwrap());
}

To run something right away, see the primary example.

For a lower-level API that allows writing/reading one chunk at a time and extracting all metadata, see the docs.rs documentation.

Library Changelog

See changelog.md

Advanced

Custom Data Types

Small data types can be efficiently compressed in expansion: for example, compressing u8 data as a sequence of u16 values. The only cost to using a larger datatype is a small increase in chunk metadata size.

When necessary, you can implement your own data type via q_compress::types::NumberLike and (if the existing signed/unsigned implementations are insufficient) q_compress::types::SignedLike and q_compress::types::UnsignedLike.

Seeking and Quantile Statistics

Recall that each chunk has a metadata section containing

  • the total count of numbers in the chunk,
  • the ranges for the chunk and count of numbers in each range,
  • and the size in bytes of the compressed body.

Using the compressed body size, it is easy to seek through the whole file and collect a list of all the chunk metadatas. One can aggregate them to obtain the total count of numbers in the whole file and even an approximate histogram. This is typically about 100x faster than decompressing all the numbers.

See the fast seeking example.

No runtime deps