36 releases

0.4.2 Nov 16, 2022
0.4.0 Sep 28, 2022
0.3.20 Jul 11, 2022
0.3.14 Mar 15, 2022

#1322 in Database interfaces

32 downloads per month
Used in 4 crates

ISC license

46KB
803 lines

Meshanina --- specialized, WORM, content-addressed database

Meshanina is a rather strange key-value database, with three assumptions:

  • Once written, a key-value mapping will never be deleted
  • Some function H maps every value to every key: H(v) = k. That is, the same key will never be rebound to a different value.

Meshanina is designed for use as a content-addressed datastore, where keys are typically hashes of values and deletion is ill-defined. It is a purely log-structured, content-addressed database file where we interleave data blocks with 64-ary HAMT nodes. When we insert keys, we just insert gobs of data, then when we flush we make sure metadata is pushed out too.

In-memory, we keep track of an Arc-linked bunch of new nodes before they get flushed out. Everything is managed in a "purely functional" way.

On-disk format

  • 4 KiB: reserved region
    • starting with 10 bytes: meshanina2
    • then 16 bytes more of a random, unique, database-specific 128-bit divider
  • indefinite number of records:
    • (possibly padding to the some nice boundary)
    • 16 bytes: magic divider stored in the reserved region
    • 8 bytes: SipHash 1-3 checksum of the record contents
    • 4 bytes: what kind of record, little endian
      • 0x00000000: data
      • 0x00000001: HAMT interior node
      • 0x00000002: HAMT root node
    • 4 bytes: length of the record
    • n bytes: the content of the record
      • for HAMT nodes, this is:
        • 8 bytes: 64-bit little-endian bitmap
        • n*8 bytes: 64-bit pointers to absolute offsets
      • for data nodes, this is:
        • 32 bytes: key
        • n bytes: value

Recovery

On DB open, there is a recovery mechanism. We search backwards, from the end of the file, for instances of the magic divider, then try to decode a record at each instance. When we find the first validly encoded HAMT root, we stop. We then scan forwards, reinserting every valid data block after this root into the database.

Assuming that there are no "gaps" in correctly written blocks --- that is, if there's a record that's correctly written, every record before it must be so too --- this defends against arbitrary crashes and power interruptions. Essentially all Unix filesystems do guarantee that interrupted file appends cannot disturb existing data in the file.

Lookup and insertion

Starting from the latest HAMT root node, do the usual HAMT lookup/insertion, using the 256-bit key value 6 bits at a time.

Dependencies

~4.5MB
~92K SLoC