13 releases

0.5.0 Nov 25, 2024
0.4.7 Jul 2, 2024
0.4.6 Apr 10, 2024
0.4.5 Nov 1, 2023
0.2.0 Jul 28, 2021

#75 in Compression

Download history 1030/week @ 2024-09-21 1040/week @ 2024-09-28 1108/week @ 2024-10-05 1251/week @ 2024-10-12 1072/week @ 2024-10-19 1290/week @ 2024-10-26 1223/week @ 2024-11-02 1223/week @ 2024-11-09 1184/week @ 2024-11-16 1191/week @ 2024-11-23 947/week @ 2024-11-30 1011/week @ 2024-12-07 820/week @ 2024-12-14 931/week @ 2024-12-21 1194/week @ 2024-12-28 1471/week @ 2025-01-04

4,501 downloads per month
Used in grenad

MIT license

1MB
3K SLoC

Grenad

License Crates.io Docs dependency status Build

Tools to sort, merge, write, and read immutable key-value pairs.


lib.rs:

This library provides tools to sort, merge, write, and read immutable key-value pairs. The entries in the grenad files are immutable and the only way to modify them is by creating a new file with the changes.

Features

You can define which compression schemes to support, there are currently a few available choices, these determine which types will be available inside the above modules:

If you need more performances you can enable the rayon feature that will enable a bunch of new settings like being able to make the Sorter sort in parallel.

Examples

Use the Writer and Reader structs

You can use the Writer struct to store key-value pairs into the specified std::io::Write type. The Reader type can then be used to read the entries.

The entries provided to the Writer struct must be given in lexicographic order.

use std::io::Cursor;

use grenad::{Reader, Writer};

let mut writer = Writer::memory();

// We insert our key-value pairs in lexicographic order.
writer.insert("first-counter", 119_u32.to_ne_bytes())?;
writer.insert("second-counter", 384_u32.to_ne_bytes())?;

// We create a reader from our writer.
let cursor = writer.into_inner().map(Cursor::new)?;
let mut cursor = Reader::new(cursor)?.into_cursor()?;

// We can see that the sum of u32s is valid here.
assert_eq!(cursor.move_on_next()?, Some((&b"first-counter"[..], &119_u32.to_ne_bytes()[..])));
assert_eq!(cursor.move_on_next()?, Some((&b"second-counter"[..], &384_u32.to_ne_bytes()[..])));
assert_eq!(cursor.move_on_next()?, None);

// We can also jum on any given entry.
assert_eq!(cursor.move_on_key_greater_than_or_equal_to("first")?, Some((&b"first-counter"[..], &119_u32.to_ne_bytes()[..])));
assert_eq!(cursor.move_on_key_equal_to("second-counter")?, Some((&b"second-counter"[..], &384_u32.to_ne_bytes()[..])));
assert_eq!(cursor.move_on_key_lower_than_or_equal_to("abracadabra")?, None);

Use the Merger struct

In this example we show how you can merge multiple Readers by using a merge function when a conflict is encountered.

The entries yielded by the Merger struct are returned in lexicographic order, a good way to write them back into a new Writer.

use std::array::TryFromSliceError;
use std::borrow::Cow;
use std::convert::TryInto;
use std::io::Cursor;

use grenad::{MergerBuilder, MergeFunction, Reader, Writer};

// This merge function:
//  - parses u32s from native-endian bytes,
//  - wrapping sums them and,
//  - outputs the result as native-endian bytes.
struct WrappingSumU32s;

impl MergeFunction for WrappingSumU32s {
    type Error = TryFromSliceError;

    fn merge<'a>(&self, key: &[u8], values: &[Cow<'a, [u8]>]) -> Result<Cow<'a, [u8]>, Self::Error> {
        let mut output: u32 = 0;
        for bytes in values.iter().map(AsRef::as_ref) {
            let num = bytes.try_into().map(u32::from_ne_bytes)?;
            output = output.wrapping_add(num);
        }
        Ok(Cow::Owned(output.to_ne_bytes().to_vec()))
    }
}

// We create our writers in memory to insert our key-value pairs.
let mut writera = Writer::memory();
let mut writerb = Writer::memory();
let mut writerc = Writer::memory();

// We insert our key-value pairs in lexicographic order
// and mix them between our writers.
writera.insert("first-counter", 32_u32.to_ne_bytes())?;
writera.insert("second-counter", 64_u32.to_ne_bytes())?;
writerb.insert("first-counter", 23_u32.to_ne_bytes())?;
writerb.insert("second-counter", 320_u32.to_ne_bytes())?;
writerc.insert("first-counter", 64_u32.to_ne_bytes())?;

// We create readers from our writers.
let cursora = writera.into_inner().map(Cursor::new)?;
let cursorb = writerb.into_inner().map(Cursor::new)?;
let cursorc = writerc.into_inner().map(Cursor::new)?;
let readera = Reader::new(cursora)?.into_cursor()?;
let readerb = Reader::new(cursorb)?.into_cursor()?;
let readerc = Reader::new(cursorc)?.into_cursor()?;

// We create a merger that will sum our u32s when necessary,
// and we add our readers to the list of readers to merge.
let merger_builder = MergerBuilder::new(WrappingSumU32s);
let merger = merger_builder.add(readera).add(readerb).add(readerc).build();

// We can iterate over the entries in key-order.
let mut iter = merger.into_stream_merger_iter()?;

// We can see that the sum of u32s is valid here.
assert_eq!(iter.next()?, Some((&b"first-counter"[..], &119_u32.to_ne_bytes()[..])));
assert_eq!(iter.next()?, Some((&b"second-counter"[..], &384_u32.to_ne_bytes()[..])));
assert_eq!(iter.next()?, None);

Use the Sorter struct

In this example we show how by defining a merge function, we can insert multiple entries with the same key and output them in lexicographic order.

The Sorter accepts the entries in any given order, will reorder them in-memory and merge them with the merge function when required. It is authorized to have a memory budget during its construction and will try to follow it as closely as possible.

use std::array::TryFromSliceError;
use std::borrow::Cow;
use std::convert::TryInto;

use grenad::{CursorVec, MergeFunction, SorterBuilder};

// This merge function:
//  - parses u32s from native-endian bytes,
//  - wrapping sums them and,
//  - outputs the result as native-endian bytes.
struct WrappingSumU32s;

impl MergeFunction for WrappingSumU32s {
    type Error = TryFromSliceError;

    fn merge<'a>(&self, key: &[u8], values: &[Cow<'a, [u8]>]) -> Result<Cow<'a, [u8]>, Self::Error> {
        let mut output: u32 = 0;
        for bytes in values.iter().map(AsRef::as_ref) {
            let num = bytes.try_into().map(u32::from_ne_bytes)?;
            output = output.wrapping_add(num);
        }
        Ok(Cow::Owned(output.to_ne_bytes().to_vec()))
    }
}

// We create a sorter that will sum our u32s when necessary.
let mut sorter = SorterBuilder::new(WrappingSumU32s).chunk_creator(CursorVec).build();

// We insert multiple entries with the same key but different values
// in arbitrary order, the sorter will take care of merging them for us.
sorter.insert("first-counter", 32_u32.to_ne_bytes())?;
sorter.insert("first-counter", 23_u32.to_ne_bytes())?;
sorter.insert("first-counter", 64_u32.to_ne_bytes())?;

sorter.insert("second-counter", 320_u32.to_ne_bytes())?;
sorter.insert("second-counter", 64_u32.to_ne_bytes())?;

// We can iterate over the entries in key-order.
let mut iter = sorter.into_stream_merger_iter()?;

// We can see that the sum of u32s is valid here.
assert_eq!(iter.next()?, Some((&b"first-counter"[..], &119_u32.to_ne_bytes()[..])));
assert_eq!(iter.next()?, Some((&b"second-counter"[..], &384_u32.to_ne_bytes()[..])));
assert_eq!(iter.next()?, None);

Dependencies

~2–12MB
~155K SLoC