#bloom-filter #key-value-store #checksum #sorting #table #immutability #string

sstable

Sorted String Tables, an on-disk format for storing immutable maps consisting of string,string pairs, and retrieving values by key efficiently. This crate also features bloom filters, checksums and skipping bad blocks. It is based on the code implemented for the rusty_leveldb crate.

29 releases

0.11.1 Apr 22, 2023
0.10.0 Dec 29, 2021
0.9.0 Nov 16, 2021
0.8.2 Apr 26, 2020
0.3.1 Nov 23, 2016

#522 in Data structures

Download history 83/week @ 2024-07-20 127/week @ 2024-07-27 53/week @ 2024-08-03 46/week @ 2024-08-10 69/week @ 2024-08-17 71/week @ 2024-08-24 137/week @ 2024-08-31 48/week @ 2024-09-07 56/week @ 2024-09-14 93/week @ 2024-09-21 69/week @ 2024-09-28 40/week @ 2024-10-05 52/week @ 2024-10-12 44/week @ 2024-10-19 30/week @ 2024-10-26 38/week @ 2024-11-02

167 downloads per month
Used in 7 crates (via graphannis-core)

MIT license

115KB
2.5K SLoC

sstable

crates.io Travis CI

Documentation

What

This crate provides an API to work with immutable (string -> string) maps stored on disk. The main access method are iterators, but there's a simpler API, too.

The general process is

  • Writing a table, using TableBuilder. The entries have to be added in sorted order. The data doesn't have to be written to disk; any type implementing Write works.
  • Reading a table, using Table. Again, the source is generic; any type implementing Read + Seek can be used.

Note that the tables and some other structures are generic over the ordering of keys; usually you can just use StandardComparator, though.

With Options, you can influence some details of how tables are laid out on disk. Usually, you don't need to; just use the Options::default() value.

If there's data corruption in the files on disk, defective blocks will be skipped. How many entries a single block contains depends on the block size, which can be set in the Options struct.

Why

This crate reuses code originally written for the persistence part of rusty-leveldb, a reimplementation of Google's LevelDB in Rust. That's the reason for the code being a bit more complicated than needed at some points.

Performance

With no compression on a tmpfs volume running on an idle Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz processor, the benchmark shows that in tables of 10'000 entries of each 16 key bytes and 16 value bytes, this crate will

  • read 5.3 million entries per second
  • write 1.2 million entries per second

The performance for tables of different sizes may differ.

Corruption and errors

Checksum verification failures often stem from either corruption (obviously) or incompletely written or half-overwritten SSTable files.

Contribute

Contributions are very welcome! Feel free to send pull requests.

Dependencies

~245KB