#serde #json #sqlite #binary-format #jsonb #serializer-deserializer #byte

fifthtry-serde_sqlite_jsonb

A Serializer and Deserializer for SQLite jsonb binary format (FifthTry fork, till PR is merged)

1 unstable release

0.1.0 Apr 27, 2024

#734 in Encoding

MIT/Apache

62KB
1.5K SLoC

serde-sqlite-jsonb

This crate provides a custom Serde deserializer for SQLite JSONB columns.

It was initially developed for inclusion in the SQLPage website builder.

Why

Since version 3.45.0, SQLite supports JSONB columns, which can store JSON data in a binary format that is more efficient to manipulate than JSON.

The problem is that applications that use SQLite currently need to convert the data from JSONB to JSON, and then from JSON to their own data structures to use it. This prevents reading the blob data directly from the database in a streaming fashion using SQLite's blob streaming API, and requires making a SQL query to extract and convert the data to JSON.

This crate provides a custom Serde serializer and deserializer for JSONB directly, which allows skipping the JSON conversion step.

This can lead to significant performance improvements in certain scenarios, as demonstrated in this crate's benchmarks.

Benchmarks

These graphs show the time taken:

  • deserializing a JSONB column directly to a struct using this crate
  • making a SQL query to extract the JSONB column as JSON and then deserializing it to a struct using serde_json.

The data being deserialized contains a string, the length of which varies from 50 to 1000 characters, to demonstrate the evolution of the performance with the size of the data.

Benchmark results Benchmark results

Disclaimer: These benchmarks should always be taken with a grain of salt. When performance matters, you should measure the performance inside your own application with your own data. serde_json is very well optimized and can be faster than this crate in some scenarios, especially when the JSON data is small.

Crate features

The binary format can contain raw json data, so this crate depends on the serde_json crate to parse the JSON data. Since SQLite also supports json5, the serde-json5 feature can be used if json5 support is needed.

By default, the (faster) serde_json feature is enabled and this crate returns an error when trying to parse json5 data. To enable json5 support, enable the serde-json5 feature (and optionally disable the default features to use the json5 parser even for json data):

[dependencies]
serde-sqlite-jsonb = { version = "0.1", features = ["serde-json5"], default-features = false }

Usage

This library does not handle the SQLite connection, so you need to use a crate like rusqlite or sqlx to interact with the database.

Once you have extracted the JSONB data from the database, either as a Vec<u8> or as a std::io::Read object that streams the BLOB data from the database, you can use the serde_sqlite_jsonb crate to deserialize the JSON data, either to your own data structures or to a serde_json::Value.

Deserialize JSONB from a query result

let conn = rusqlite::Connection::open_in_memory()?;
let blob: Vec<u8> = conn.query_row(
    r#"select jsonb('{"id": 1, "name": "John Doe"}')"#, [], |row| row.get(0),
)?;
let person: Person = serde_sqlite_jsonb::from_bytes(&blob).unwrap();

Streaming deserialization from a SQLite BLOB

let my_blob = conn.blob_open( // returns an object that implements std::io::Read
    DatabaseName::Main,
    "my_table", // table name
    "my_jsonb_column", // column name
    42, // primary key (rowid)
    true // read-only
)?;
let parsed: serde_json::Value = // or any other type that implements Deserialize
    serde_sqlite_jsonb::from_reader(my_blob).unwrap();

Format

The format of the JSONB column is described in the SQLite documentation: https://sqlite.org/draft/jsonb.html

The data format is a binary format with a header and payload. The header contains information about the type of the element and the size of the payload. The payload contains the actual data.

Here's a rough ASCII representation:

bits:  0  1  2  3  4  5  6  7  8
    +-------------+-------------+
    |  size(4)    | type(4)     | first header byte
    +-------------+-------------+
    |   payload size (0 - 64)   | header bytes number 2 to 9
    +---------------------------+
    |   payload data            | payload bytes (JSON strings or numbers in text format)
    +---------------------------+

Header

Payload size

If the payload data is between 0 and 11 bytes (inclusive), the size is encoded in the first 4 bits of the header. Otherwise, the size of the payload is encoded in the next bytes, and the first 4 bits indicate the number of bytes used to encode the payload size, using the following table:

Payload Data Size Range Size Encoding First 4 bits of Header
0 to 11 bytes u4 (embedded in first 4 bits) 0 to 11 (0x0 to 0xB)
12 to 2^8 - 1 bytes u8 12 (0xC)
2^8 to 2^16 - 1 bytes u16 13 (0xD)
2^16 to 2^32 - 1 bytes u32 14 (0xE)
2^32 to 2^64 - 1 bytes u64 15 (0xF)

Type

Type Hex Code Description
Null 0x0 The element is a JSON "null".
True 0x1 The element is a JSON "true".
False 0x2 The element is a JSON "false".
Int 0x3 The element is a JSON integer value in the canonical RFC 8259 format.
Int5 0x4 The element is a JSON5 integer, such as 0xABC.
Float 0x5 The element is a JSON floating-point value in the canonical RFC 8259 format.
Float5 0x6 The element is a JSON5 floating-point value that is not in the canonical JSON format.
Text 0x7 The element is a JSON string value that does not contain any escapes.
TextJ 0x8 The element is a JSON string value that contains RFC 8259 character escapes.
Text5 0x9 The element is a JSON5 string value that contains character escapes, including some from JSON5.
TextRaw 0xA The element is a JSON string value that contains UTF8 characters that need escaping in JSON.
Array 0xB The element is a JSON array. The header of the first array element starts immediately after the array header.
Object 0xC The element is a JSON object. Object keys (strings) and values are alternated in the payload.
Reserved13 0xD Reserved for future expansion.
Reserved14 0xE Reserved for future expansion.
Reserved15 0xF Reserved for future expansion.

Example

The following JSON object:

{"a": false, "b":true}

is encoded as the following 7 bytes of binary data:

6c 17 61 02 17 62 01
byte value description
0 0x6c header: payload size = 6, type = Object (0xC)
1 0x17 header: payload size = 1, type = Text (0x7)
2 0x61 payload: 'a'
3 0x02 header: payload size = 0, type = False (0x2)
4 0x17 header: payload size = 1, type = Text (0x7)
5 0x62 payload: 'b'
6 0x01 header: payload size = 0, type = True (0x1)

MSRV

Requires rust >= 1.63 (debian stable)

Dependencies

~0.4–1.1MB
~24K SLoC