2 unstable releases
0.2.0 | Sep 1, 2023 |
---|---|
0.1.0 | Jun 9, 2023 |
#38 in Database implementations
40,398 downloads per month
Used in 18 crates
(3 directly)
175KB
4K
SLoC
SSTable
The tantivy-sstable
crate is yet another sstable crate.
It has been designed to be used in quickwit
:
- as an alternative to the default tantivy fst dictionary.
- as a way to store the column index for dynamic fast fields.
The benefit compared to the fst crate is locality. Searching a key in the fst crate requires downloading the entire dictionary.
Once the sstable index is downloaded, running a get
in the sstable
crate only requires a single fetch.
Right now, the block index and the default block size have been thought for quickwit, and the performance of a get is very bad.
Sorted strings?
SSTable stands for Sorted String Table. Strings have to be insert in sorted order.
That sorted order is used in different ways:
- it makes gets and streaming ranges of keys possible.
- it allows incremental encoding of the keys
- the front compression is leveraged to optimize the intersection with an automaton
On disk format
Overview of the SSTable format. Unless noted otherwise, numbers are little-endian.
SSTable
+-------+-------+-----+--------+
| Block | Block | ... | Footer |
+-------+-------+-----+--------+
|----( # of blocks)---|
- Block(
SSTBlock
): list of independent block, terminated by a single empty block. - Footer(
SSTFooter
)
SSTBlock
+----------+----------+--------+-------+-------+-----+
| BlockLen | Compress | Values | Delta | Delta | ... |
+----------+----------+--------+-------+-------+-----+
| |----( # of deltas)---|
|------(maybe compressed)------|
- BlockLen(u32): length of the block, including the compress byte.
- Compress(u8): indicate whether block is compressed. 0 if not compressed, 1 if compressed.
- Values: an application defined format storing a sequence of value, capable of determining it own length
- Delta
Delta
+---------+--------+
| KeepAdd | Suffix |
+---------+--------+
- KeepAdd
- Suffix: KeepAdd.add bytes of key suffix
KeepAdd
KeepAdd can be represented in two different representation, a very compact 1byte one which is enough for most usage, and a longer variable-len one when required
When keep < 16 and add < 16
+-----+------+
| Add | Keep |
+-----+------+
- Add(u4): number of bytes to push
- Keep(u4): number of bytes to pop
Otherwise:
+------+------+-----+
| 0x01 | Keep | Add |
+------+------+-----+
- Add(VInt): number of bytes to push
- Keep(VInt): number of bytes to pop
Note: as the SSTable does not support redundant keys, there is no ambiguity between both representation. Add is always guaranteed to be non-zero, except for the very first key of an SSTable, where Keep is guaranteed to be zero.
SSTFooter
+-------+-------+-----+-------------+---------+---------+
| Block | Block | ... | IndexOffset | NumTerm | Version |
+-------+-------+-----+-------------+---------+---------+
|----( # of blocks)---|
- Block(SSTBlock): uses IndexValue for its Values format
- IndexOffset(u64): Offset to the start of the SSTFooter
- NumTerm(u64): number of terms in the sstable
- Version(u32): Currently equal to 2
IndexValue
+------------+----------+-------+-------+-----+
| EntryCount | StartPos | Entry | Entry | ... |
+------------+----------+-------+-------+-----+
|---( # of entries)---|
- EntryCount(VInt): number of entries
- StartPos(VInt): the start pos of the first (data) block referenced by this (index) block
- Entry (IndexEntry)
Entry
+----------+--------------+
| BlockLen | FirstOrdinal |
+----------+--------------+
- BlockLen(VInt): length of the block
- FirstOrdinal(VInt): ordinal of the first element in the given block
Dependencies
~8MB
~145K SLoC