#key-value-store #byte #document #vec-u8 #pair #length

kvds

An efficient key-value store for encoding documents

4 releases

0.2.0 Aug 29, 2019
0.1.2 Aug 26, 2019
0.1.1 Aug 21, 2019
0.1.0 Aug 17, 2019

#1732 in Encoding

MIT/Apache

15KB
203 lines

Key Value Document Store (KVDS)

An efficient key-value store for encoding documents.

Build Status crates.io docs.rs license: MIT OR Apache-2.0

Jump to usage

Features

KVDS is a format for encoding key-value lists where the key is a single u8 and the value is an array (called a Vec in Rust) of u8.

How it works

The following is in hex.

The length of the value, in bytes
         │
       ┌─┴─┐
    14 00 05 48 65 6c 6c 6f
    └┤       └──────┬─────┘
     │              │
The key,    The value, usually a
a number      string in UTF-8
between
0 and 255

The above is a key-value pair with a key of 20 and a value of hello.

Features

  • There can be more than one of the same key
  • The key-value pairs are kept in order
  • Uses an efficient encoding mechanism
  • Meant for encoding documents, much like XML

KVDS does not have a built-in way of dealing with nesting; however, given that encoded KVDS is a Vec<u8>, and values are also Vec<u8>, nesting is not hard to implement on a case-by-case basis

KVDS vs. XML

XML is a popular format for encoding documents. Here is a comparison.

XML:

<1>Hello</1>
<2>, </2>
<4>world</4>
<1>!</1>

In XML, every ASCII character is a byte. So the above list is 41 bytes.

KVDS:

01 00 05 48 65 6c 6c 6f
02 00 02 2c 20
04 00 05 77 6f 72 6c 64
01 00 01 21

Here, KVDS is represented in hex notation, which means that every pair of characters is a byte. The list is represented in 25 bytes.

Three disadvantages of KVDS compared to XML are that it is not easily human-readable (which does not really matter for encoding documents), does not support nesting (although it can be easily implemented), and that keys can only be one byte long (a tradeoff that decreases the amount of space needed).

Usage

This crate, KVDS, is meant to be used as a Rust library. See the basic example for more information.

A note about storing Strings

One of the most common types of data held in the value field is a string. In Rust, a Vec<u8> can be converted to a String as follows:

String::from(std::str::from_utf8(&VEC_U8[..])?);

The String can be converted back as follows:

STRING.as_bytes().to_vec();

Saving to a file

The process of saving to a file is detailed in the files example. If you want to manually read/write to a file, see the old files example.

Size limits

Keys can only be one byte. There are therefore 256 distinct keys.

On 64-bit machines, values can be up to 18.4 exabytes. (On 32-bit machines, a little over 4 gigabytes; On 128-bit machines, the possibilities are pretty much endless). Needless to say, there are many, many unique combinations.

(AFAIK, 18.4 exabytes is also the theoretical limit for RAM in 64-bit CPUs).

FYI

The encoded format of KVDS 0.2.x is incompatible with the encoded format of KVDS 0.1.x.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

No runtime deps