#binary-format #binary-encoding #binary-data #kiwi #schema #fields #optional

kiwi-schema

This is a Rust library with some helper routines for parsing files in the Kiwi serialization format

13 releases

Uses old Rust 2015

0.2.1 Sep 3, 2023
0.2.0 Aug 11, 2023
0.1.2 Jun 4, 2020
0.1.1 Sep 26, 2018
0.0.4 Jan 31, 2018

#733 in Parser implementations

MIT license

75KB
1.5K SLoC

Kiwi Message Format

Kiwi is a schema-based binary format for efficiently encoding trees of data. It's inspired by Google's Protocol Buffer format but is simpler, has a more compact encoding, and has better support for optional fields.

Goals:

  • Efficient encoding of common values: Variable-length encoding is used for numeric values where small values take up less space.
  • Efficient encoding of compound objects: The struct feature supports nested objects with zero encoding overhead.
  • Presence of optional fields is detectable: This is not possible with Protocol Buffers, especially for repeated fields.
  • Linearly serializable: Reading and writing are both single-scan operations so they are cache-efficient and have guaranteed time complexity.
  • Backwards compatibility: New versions of the schema can still read old data.
  • Forwards compatibility: Old versions of the schema can optionally read new data if a copy of the new schema is bundled with the data (the new schema lets the decoder skip over unknown fields).
  • Simple implementation: The API is very minimal and the generated C++ code only depends on a single file.

Non-goals:

  • Optimal bit-packing: Compression can be used after encoding for more space savings if needed.

Native Types

  • bool: A value that stores either true or false. Will use 1 byte.
  • byte: An unsigned 8-bit integer value. Uses 1 byte, obviously.
  • int: A 32-bit integer value stored using a variable-length encoding optimized for storing numbers with a small magnitude. Will use at most 5 bytes.
  • uint: A 32-bit integer value stored using a variable-length encoding optimized for storing small non-negative numbers. Will use at most 5 bytes.
  • float: A 32-bit floating-point number. Normally uses 4 bytes but a value of zero uses 1 byte (denormal numbers become zero when encoded).
  • string: A UTF-8 null-terminated string. Will use at least 1 byte.
  • int64: A 64-bit integer value stored using a variable-length encoding optimized for storing numbers with a small magnitude. Will use at most 9 bytes.
  • uint64: A 64-bit integer value stored using a variable-length encoding optimized for storing small non-negative numbers. Will use at most 9 bytes.
  • T[]: Any type can be made into an array using the [] suffix.

User Types

  • enum: A uint with a restricted set of values that are identified by name. New fields can be added to any message while maintaining backwards compatibility.
  • struct: A compound value with a fixed set of fields that are always required and written out in order. New fields cannot be added to a struct once that struct is in use.
  • message: A compound value with optional fields. New fields can be added to any message while maintaining backwards compatibility.

Example Schema

enum Type {
  FLAT = 0;
  ROUND = 1;
  POINTED = 2;
}

struct Color {
  byte red;
  byte green;
  byte blue;
  byte alpha;
}

message Example {
  uint clientID = 1;
  Type type = 2;
  Color[] colors = 3;
}

Live Demo

See http://evanw.github.io/kiwi/ for a live demo of the schema compiler.

Usage Examples

Pick a language:

No runtime deps