#serialization #byte-string #serialize-deserialize #array #encoded #type #serde-json

no-std serde-encoded-bytes

Efficient bytestring serialization for Serde-supporting types

1 unstable release

0.1.0 Oct 16, 2024

#373 in Encoding

Download history 442/week @ 2024-10-14 331/week @ 2024-10-21 310/week @ 2024-10-28 63/week @ 2024-11-04 165/week @ 2024-11-11

911 downloads per month
Used in 6 crates (via synedrion)

MIT license

41KB
805 lines

Efficient bytestring serialization for serde-supporting types

crate Docs License Build Status Coverage

What it does

Byte arrays ([u8; N], Vec<u8>, Box<[u8]> and so on) are treated by serde as arrays of integers, which leads to inefficient representations in various formats. For example, this is how serialization works by default for binary and human-readable formats:

use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize)]
struct Array([u8; 16]);

let array = Array([
    0, 1, 0xf2, 3, 0xf4, 5, 0xf6, 7,
    0xf8, 9, 0xfa, 11, 0xfc, 13, 14, 0xff
]);

// Serializing as MessagePack
assert_eq!(
    rmp_serde::encode::to_vec(&array).unwrap(),
    [
        220, 0, 16, 0, 1, 204, 242, 3, 204, 244, 5, 204, 246, 7,
        204, 248, 9, 204, 250, 11, 204, 252, 13, 14, 204, 255
    ]
);

// Serializing as JSON
assert_eq!(
    serde_json::to_string(&array).unwrap(),
    "[0,1,242,3,244,5,246,7,248,9,250,11,252,13,14,255]"
);

Note that in MessagePack the bytes with the value above 0x7f (that is, the ones with the MSB set) are prefixed by 0xcc (204), which makes the bytestring take more space than it should. And in case of JSON, the bytestring is serialized as an array of integers, which is also not very efficient.

This crate provides methods that can be used in serde(with) field attribute to amend this behavior and serialize bytestrings efficiently, verbatim in binary formats, or using the selected encoding in human-readable formats.

Usage

To use, add a serde(with) annotation with an argument composed of a container type (whether it is array-like, slice-like and so on) and the desired encoding:

use serde::{Deserialize, Serialize};
use serde_encoded_bytes::{ArrayLike, Hex};

#[derive(Debug, PartialEq, Eq, Serialize, Deserialize)]
struct Array(#[serde(with = "ArrayLike::<Hex>")] [u8; 16]);

let array = Array([
    0, 1, 0xf2, 3, 0xf4, 5, 0xf6, 7,
    0xf8, 9, 0xfa, 11, 0xfc, 13, 14, 0xff,
]);

// Serializing as MessagePack
assert_eq!(
    rmp_serde::encode::to_vec(&array).unwrap(),
    [196, 16, 0, 1, 242, 3, 244, 5, 246, 7, 248, 9, 250, 11, 252, 13, 14, 255]
);

// Serializing as JSON
assert_eq!(
    serde_json::to_string(&array).unwrap(),
    "\"0x0001f203f405f607f809fa0bfc0d0eff\""
);

As you can see, the serialization of the example above is now more efficient in either format.

Note that due to serde limitations (see https://github.com/serde-rs/serde/issues/2120) fixed-size arrays will still be serialized with their length included in binary formats.

Tested formats

While this crate is supposed to work for any format that supports serde, it is specifically tested on:

Prior art

More established crates with intersecting functionality include:

Dependencies

~120–410KB