#streaming #vec-u8 #iterator #data #fallible #buffer #items

streaming-decompression

Fallible streaming iterator specialized for decompression

3 releases

0.1.2 Jul 30, 2022
0.1.1 Jul 30, 2022
0.1.0 Oct 15, 2021

#100 in Compression

Download history 13189/week @ 2023-12-06 14453/week @ 2023-12-13 9498/week @ 2023-12-20 9765/week @ 2023-12-27 12772/week @ 2024-01-03 14806/week @ 2024-01-10 14733/week @ 2024-01-17 14061/week @ 2024-01-24 14298/week @ 2024-01-31 13859/week @ 2024-02-07 15206/week @ 2024-02-14 14739/week @ 2024-02-21 17217/week @ 2024-02-28 17083/week @ 2024-03-06 16596/week @ 2024-03-13 15452/week @ 2024-03-20

69,146 downloads per month
Used in 76 crates (2 directly)

Apache-2.0

11KB
159 lines

This crate contains a FallibleStreamingIterator optimized for decompressions.

A typical problem that libraries working with compressed formats face is that they need to preserve an intermediary buffer to decompress multiple items. Specifically, implementations that use

fn decompress(compressed: Vec<u8>) -> Vec<u8> {
    unimplemented!("Decompress")
}

are ineficient because they will need to allocate a new Vec<u8> for every decompression, and this allocation scales with the average decompressed size the items.

The typical solution for this problem is to offer

fn decompress(compressed: Vec<u8>, decompressed: &mut Vec<u8>) {
    decompressed.clear();
    unimplemented!("Decompress into `decompressed`, maybe re-allocing it.")
}

Such API avoids one allocation per item, but requires the user to deal with preserving decompressed between iterations. Such pattern is not possible to achieve with Iterator API atm.

This crate offers Decompressor, a FallibleStreamingIterator that consumes an Iterator of compressed items that yields uncompressed items, maintaining an internal Vec<u8> that is automatically re-used across items.

Example

use streaming_codec::{Decompressor, Compressed, Decompressed, FallibleStreamingIterator};

// An item that is decompressable
#[derive(Debug, PartialEq)]
struct CompressedItem {
    pub metadata: String,
    pub data: Vec<u8>,
}
impl Compressed for CompressedItem {
    fn is_compressed(&self) -> bool {
        // whether it is decompressed or not depends on some metadata.
        self.metadata == "is_compressed"
    }
}

// A decompressed item
#[derive(Debug, PartialEq)]
struct DecompressedItem {
    pub metadata: String,
    pub data: Vec<u8>,
}

impl Decompressed for DecompressedItem {
    fn buffer_mut(&mut self) -> &mut Vec<u8> {
        &mut self.data
    }
}

// the decompression function. This could call LZ4, Snappy, etc.
fn decompress(
    mut i: CompressedItem,
    buffer: &mut Vec<u8>,
) -> Result<DecompressedItem, std::convert::Infallible> {
    if i.is_compressed() {
        // the actual decompression, here identity, but more complex stuff can happen.
        buffer.clear();
        buffer.extend(&mut i.data.iter().rev());
    } else {
        std::mem::swap(&mut i.data, buffer);
    };
    Ok(DecompressedItem {
        metadata: i.metadata,
        data: std::mem::take(buffer),
    })
}

fn main() -> Result<(), std::convert::Infallible> {
   // consider some compressed items
   let item1 = CompressedItem {
       metadata: "is_compressed".to_string(),
       data: vec![1, 2, 3],
   };
   let item2 = CompressedItem {
       metadata: "is_compressed".to_string(),
       data: vec![4, 5, 6],
   };
   let iter = vec![Ok(item1), Ok(item2)].into_iter();

   let buffer = vec![0; 4];  // the internal buffer: it could contain anything.
   let mut decompressor = Decompressor::new(iter, buffer, decompress);

   let item = decompressor.next()?.unwrap();
   // the item was decompressed
   assert_eq!(item.data, vec![3, 2, 1]);
   assert_eq!(item.metadata, "is_compressed".to_string());

   let item = decompressor.next()?.unwrap();
   // the item was decompressed
   assert_eq!(item.data, vec![6, 5, 4]);
   assert_eq!(item.metadata, "is_compressed".to_string());

   assert_eq!(decompressor.next()?, None);

   // we can re-use the internal buffer if we wish to
   let internal = decompressor.into_inner();
   assert_eq!(internal, vec![6, 5, 4]);
   Ok(())
}

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Dependencies

~23KB