13 releases (7 breaking)

0.8.0 Jul 11, 2024
0.7.0 Mar 25, 2024
0.6.0 Feb 19, 2024
0.5.1 Nov 19, 2023
0.1.1 Dec 22, 2018

#185 in Data structures

Download history 181/week @ 2024-08-16 50/week @ 2024-08-23 140/week @ 2024-08-30 183/week @ 2024-09-06 407/week @ 2024-09-13 382/week @ 2024-09-20 474/week @ 2024-09-27 289/week @ 2024-10-04 372/week @ 2024-10-11 515/week @ 2024-10-18 275/week @ 2024-10-25 279/week @ 2024-11-01 211/week @ 2024-11-08 212/week @ 2024-11-15 254/week @ 2024-11-22 346/week @ 2024-11-29

1,059 downloads per month
Used in 2 crates (via exocore-chain)

Apache-2.0

69KB
1.5K SLoC

extindex

crates.io

Immutable persisted index (on disk) that can be built in one pass using a sorted iterator, or can use extsort to externally sort the iterator first, and then build the index from it.

The index allows random lookups and sorted scans. An indexed entry consists of a key and a value. The key needs to implement Eq and Ord, and both the key and values need to implement a Serializable trait for serialization to and from disk. It is possible to rely on the serde library to implement this trait for most types.

The index is built using a skip list-like data structure, but lookups start from the end of the index instead of the beginning. This allows building the index in a single pass on a sorted iterator, as starting from the beginning would require knowing checkpoints/nodes ahead in the file.

Example

extern crate extindex;
extern crate serde;

use extindex::{Builder, Entry, Reader, SerdeWrapper};

#[derive(Ord, PartialOrd, Eq, PartialEq, Debug, serde::Serialize, serde::Deserialize)]
struct SomeStruct {
    a: u32,
    b: String,
}

fn main() {
    let index_file = tempfile::NamedTempFile::new().unwrap();

    let builder = Builder::new(index_file.path());
    let entries = vec![Entry::new(
        "my_key".to_string(),
        SerdeWrapper(SomeStruct {
            a: 123,
            b: "my value".to_string(),
        }),
    )];
    builder.build(entries.into_iter()).unwrap();

    let reader = Reader::<String, SerdeWrapper<SomeStruct>>::open(index_file).unwrap();
    assert!(reader.find(&"my_key".to_string()).unwrap().is_some());
    assert!(reader.find(&"notfound".to_string()).unwrap().is_none());
}

Roadmap

  • Possibility to use a Bloom filter to avoid disk access when the index does not contain a key.

Dependencies

~3–12MB
~172K SLoC