#key-value #lsm #lsm-tree #blocking #in-memory #byte #super

bin+lib tiny-lsm

a dead-simple in-memory blocking LSM tree for constant-sized keys and values

16 unstable releases (3 breaking)

0.4.6 Apr 21, 2022
0.4.5 Dec 21, 2021
0.3.1 Dec 20, 2021
0.2.4 Dec 6, 2021
0.1.4 Dec 1, 2021

#365 in Compression

Download history 1/week @ 2024-02-16 1/week @ 2024-02-23 1/week @ 2024-03-01 1/week @ 2024-03-08 164/week @ 2024-03-29

165 downloads per month

GPL-3.0 license

42KB
891 lines

tiny-lsm

Super simple in-memory blocking LSM for constant-size keys and values.

Despite being single-threaded and blocking, this is still capable of outperforming a wide range of other storage systems.

This is a great choice when you:

  • want to fit the whole data set in-memory
  • can model your keys and values in a bounded number of bytes

Tested with fuzzcheck, and the API and internals are intentionally being kept minimal to reduce bugs and improve performance for the use cases that this works well for.

Pairs extremely well with the zerocopy crate for viewing the fixed size byte arrays as typed data without paying expensive deserialization costs.


lib.rs:

tiny-lsm is a dead-simple in-memory LSM for managing fixed-size metadata in more complex systems.

Uses crc32fast to checksum all key-value pairs in the log and sstables. Uses zstd to compress all sstables. Performs sstable compaction in the background.

Because the data is in-memory, there is no need to put bloom filters on the sstables, and read operations cannot fail due to IO issues.

Lsm implements Deref<Target=BTreeMap<[u8; K], [u8; V]>> to immutably access the data directly without any IO or blocking.

Lsm::insert writes all data into a 32-kb BufWriter in front of a log file, so it will block for very short periods of time here and there. SST compaction is handled completely in the background.

This is a bad choice for large data sets if you require quick recovery time because it needs to read all of the sstables and the write ahead log when starting up.

The benefit to using tiered sstables at all, despite being in-memory, is that they act as an effective log-deduplication mechanism, keeping space amplification very low.

Maximum throughput is not the goal of this project. Low space amplification and very simple code is the goal, because this is intended to maintain metadata in more complex systems.

There is currently no compaction throttling. You can play with the Config options around compaction to change compaction characteristics.

Never change the constant size of keys or values for an existing database.

Examples

// open up the LSM
let mut lsm = tiny_lsm::Lsm::recover("path/to/base/dir").expect("recover lsm");

// store some things
let key: [u8; 8] = 8_u64.to_le_bytes();
let value: [u8; 1] = 255_u8.to_le_bytes();
lsm.insert(key, value);

assert_eq!(lsm.get(&key), Some(&value));

Dependencies

~3MB
~55K SLoC