#serialization #serialization-deserialization #bit #deserialize #codec #encode #decode

zlo

A binary serialization/deserialization strategy that uses Serde for transforming structs into very compact bit representations

1 unstable release

Uses old Rust 2015

0.1.0 Sep 25, 2017

#1775 in Encoding

MIT license

75KB
1.5K SLoC

zlo

Docs

An encoder/decoder pair that uses a bit-compact binary encoding scheme. The size of the encoded object will be almost same or smaller than the size that the object takes up in memory in a running Rust program. It is a fork of bincode and so resembles its API and also includes familiar SizeLimit objects.

It was made for use in networking in a multi-player game, fit to encode diffs of data sent very frequently but in small pieces over network, where bincode produces comparably large chunks of data and compressing that with common algorithms such as LZO tends to yield very tiny improvements over zlo-encoded data at expense of added long compression times.

Stability of the binary format is NOT guaranteed across major versions.

Example

#[macro_use]
extern crate serde_derive;
extern crate zlo;

use zlo::{serialize, deserialize, Infinite};

#[derive(Serialize, Deserialize, PartialEq, Debug)]
struct Entity {
    x: f32,
    y: f32,
}

#[derive(Serialize, Deserialize, PartialEq, Debug)]
struct World(Vec<Entity>);

fn main() {
    let world = World(vec![Entity { x: 0.0, y: 4.0 }, Entity { x: 10.0, y: 20.5 }]);

    let encoded: Vec<u8> = serialize(&world, Infinite).unwrap();

    assert!(encoded.len() < 8 + 4 * 4);

    let decoded: World = deserialize(&encoded[..]).unwrap();

    assert_eq!(world, decoded);
}

Performance, output size

zlo deserializer is NOT zero-copy and can not be in majority of cases. This is because bit-unmangled data occurs rarely in this encoding.

Fastest primitives to process are unsigned ints and bools. Performance also depends on amount of data written. E.g. encoding small 64 bit int is just barely slower than it would be with a value of a type it fits.

zlo is primarily oriented to be used to serialize diffs of data, that is, structs with lots of Options, small ints and diffed-zigzagged floats in them. Under these conditions zlo can yield very compact data.

Vs Bincode

When encoding large 64 bit ints zlo can be up to 7 times slower than bincode. On average, zlo can be expected to be 3 to 5 times slower than bincode when encoding lots integers or floats. When encoding plain bytes zlo has comparable performance, though, zlo is not zero-copy so it may incur extra allocation overhead compared to bincode at deserialization.

Output size, on average, in comparison to bincode, can be expected to be up to 1.5 times smaller when encoding numbers and up to 8 times when encoding lots of bools or Option::Nones.

Need to squeeze data in even less bits?

  • Consider diffing floats, but rather by their parts (exponent and fraction).
  • Or perhaps your value stays in a specific range known ahead? Consider then serializing it as an integer, unsigned or not.
  • Is it multidimensional? Maybe you could infer dependent elements from just a single record.
  • Serializing a multidimensional unit value such as unit vector or quaternion? Serialize it in a more compact way, for example, 2D unit vector could be made into just an angle.

See examples.

Lossy fraction coding

There are no plans for it. Additionally, even if lossy fractions are considered, it can only be expected after Rust issue #44580 is stabilized.

Details

Booleans are encoded as single bits, integers in a form similar but not equal to LEB128, floats are encoded in a somewhat complicated way described below, tuples and structs are encoded by encoding their fields one-by-one, and enums are encoded by first writing out the tag representing the variant and then the contents.

Unlike bincode, zlo has no configurable byte order because of variable amount of bytes written.

Implementation details to be aware of:

  • Separate bits are written from LSB to MSB.
  • Unsigned integers are encoded using the following principle:
PC
0 if integer > 0, then write bit 1, else write bit 0 and return
1 write next 8 bits of the integer
2 if this is the most significant byte of the whole type, then return
3 if there are no more bits to be written, then write bit 0 and return
4 write bit 1
5 goto 1
  • Signed integers are zigzag-encoded (same as protobuf) and then encoded same as unsigned integers
  • Floats are encoded the following way: sign bit is always written, exponent is written either as 2 bits or full. Fraction is written either as higher 16 bits only in case of f32, higher 32 bits in case of f64 or full if it doesn't fit.
  • isize/usize are encoded as variable i64/u64, for portability.
  • Enum variants are encoded as a variable u32 instead of a usize. u32 is enough for all practical uses.
  • str is encoded as (variable u64, &[u8]), where the u64 is the number of bytes contained in the encoded string.

License

zlo is licensed under MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

Dependencies

~110–355KB