21 releases (3 stable)
1.1.0 | Jul 16, 2023 |
---|---|
1.0.0 | Jun 21, 2023 |
0.8.1 | Jun 18, 2023 |
0.5.0 | Mar 18, 2023 |
0.4.1 | Jul 29, 2021 |
#1291 in Encoding
59 downloads per month
93KB
2K
SLoC
msgpack-schema
msgpack-schema is a schema language for describing data formats encoded in MessagePack.
It provides two derive macros Serialize
and Deserialize
that allow you to transcode MessagePack binary data to/from Rust data structures in a type-directed way.
use msgpack_schema::{Deserialize, Serialize};
#[derive(Deserialize, Serialize)]
struct Human {
#[tag = 0]
name: String,
#[tag = 2]
#[optional]
age: Option<u32>,
}
Compared with other schema languages like rmp-serde
, msgpack-schema
allows to specify more compact data representation, e.g., fixints as field keys, fixints as variant keys, etc.
Feature flags
proptest
: Enableproptest::arbitrary::Arbitrary
impls formsgpack_value::Value
.
Behaviours of serializers and deserializers
Structs with named fields
Structs with named fields are serialized into a Map
object where keys are fixints specified by #[tag]
attributes.
The current implementation serializes fields in order but one must not rely on this behavior.
The deserializer interprets Map
objects to create such structs.
Field order is irrelevant to the result.
If Map
objects contains extra key-value pairs which are not contained in the definition of the struct, the deserializer simply ignores them.
If there are two or more values with the same key within a Map
object, the preceding value is overwritten by the last value.
#[derive(Serialize, Deserialize)]
struct S {
#[tag = 0]
x: u32,
#[tag = 1]
y: String,
}
let s = S {
x: 42,
y: "hello".to_owned(),
};
let b = b"\x82\x00\x2A\x01\xA5\x68\x65\x6c\x6c\x6f"; // 10 bytes; `{ 0: 42, 1: "hello" }`
assert_eq!(serialize(&s), b);
assert_eq!(s, deserialize(b).unwrap());
// ignores irrelevant key-value pairs
let b = b"\x83\x00\x2A\x02\xC3\x01\xA5\x68\x65\x6c\x6c\x6f"; // 12 bytes; `{ 0: 42, 2: true, 1: "hello" }`
assert_eq!(s, deserialize(b).unwrap());
// last value wins
let b = b"\x83\x00\xC3\x00\x2A\x01\xA5\x68\x65\x6c\x6c\x6f"; // 12 bytes; `{ 0: true, 0: 42, 1: "hello" }`
assert_eq!(s, deserialize(b).unwrap());
Fields in named structs may be tagged with #[optional]
.
- The tagged field must be of type
Option<T>
. - On serialization, the key-value pair will not be included in the result map object when the field data contains
None
. - On deserialization, the field of the result struct will be filled with
None
when the given MsgPack map object contains no corresponding key-value pair.
#[derive(Serialize, Deserialize)]
struct S {
#[tag = 0]
x: u32,
#[optional]
#[tag = 1]
y: Option<String>,
}
let s = S {
x: 42,
y: Some("hello".to_owned()),
};
let b = b"\x82\x00\x2A\x01\xA5\x68\x65\x6c\x6c\x6f"; // 10 bytes; `{ 0: 42, 1: "hello" }`
assert_eq!(serialize(&s), b);
assert_eq!(s, deserialize(b).unwrap());
let s = S {
x: 42,
y: None,
};
let b = b"\x81\x00\x2A"; // 3 bytes; `{ 0: 42 }`
assert_eq!(serialize(&s), b);
assert_eq!(s, deserialize(b).unwrap());
The #[flatten]
attribute is used to factor out a single definition of named struct into multiple ones.
#[derive(Serialize)]
struct S1 {
#[tag = 1]
x: u32,
}
#[derive(Serialize)]
struct S2 {
#[flatten]
s1: S1,
#[tag = 2]
y: u32,
}
#[derive(Serialize)]
struct S3 {
#[tag = 1]
x: u32,
#[tag = 2]
y: u32,
}
assert_eq!(serialize(S2 { s1: S1 { x: 42 }, y: 43, }), serialize(S3 { x: 42, y: 43 }));
Structs with named fields may be attached #[untagged]
.
Untagged structs are serialized into an array and will not contain tags.
#[derive(Serialize, Deserialize)]
#[untagged]
struct S {
x: u32,
y: String,
}
let s = S {
x: 42,
y: "hello".to_owned(),
};
let b = b"\x92\x2A\xA5\x68\x65\x6c\x6c\x6f"; // 8 bytes; `[ 42, "hello" ]`
assert_eq!(serialize(&s), b);
assert_eq!(s, deserialize(b).unwrap());
Newtype structs
Tuple structs with only one element are treated transparently.
#[derive(Serialize, Deserialize)]
struct S(u32);
let s = S(42);
let b = b"\x2A"; // 1 byte; `42`
assert_eq!(serialize(&s), b);
assert_eq!(s, deserialize(b).unwrap());
Unit structs and empty tuple structs
Serialization and deserialization of unit structs and empty tuple structs are intentionally unsupported.
// It is error to derive `Serialize` / `Deserialize` for these types of structs.
struct S1;
struct S2();
Tuple structs
Tuple structs with more than one element are encoded as an array. It is validation error to deserialize an array with unmatched length.
#[derive(Serialize, Deserialize)]
struct S(u32, bool);
let s = S(42, true);
let b = b"\x92\x2A\xC3"; // 3 bytes; `[ 42, true ]`
assert_eq!(serialize(&s), b);
assert_eq!(s, deserialize(b).unwrap());
Unit variants and empty tuple variants
Unit variants and empty tuple variants are serialized into a single fixint whose value is determined by the tag.
#[derive(Serialize, Deserialize)]
enum E {
#[tag = 3]
Foo
}
let e = E::Foo;
let b = b"\x03"; // 1 byte; `3`
assert_eq!(serialize(&e), b);
assert_eq!(e, deserialize(b).unwrap());
#[derive(Serialize, Deserialize)]
enum E {
#[tag = 3]
Foo()
}
let e = E::Foo();
let b = b"\x03"; // 1 byte; `3`
assert_eq!(serialize(&e), b);
assert_eq!(e, deserialize(b).unwrap());
Newtype variants
Newtype variants (one-element tuple variants) are serialized into an array of the tag and the inner value.
#[derive(Serialize, Deserialize)]
enum E {
#[tag = 3]
Foo(u32)
}
let e = E::Foo(42);
let b = b"\x92\x03\x2A"; // 3 bytes; `[ 3, 42 ]`
assert_eq!(serialize(&e), b);
assert_eq!(e, deserialize(b).unwrap());
Untagged variants
Enums may be attached #[untagged]
when all variants are newtype variants.
Serializing untagged variants results in the same data layout as the inner type.
The deserializer deserializes into an untagged enum type by trying deserization one by one from the first variant to the last.
#[derive(Serialize, Deserialize)]
#[untagged]
enum E {
Foo(String),
Bar(u32),
}
let e = E::Bar(42);
let b = b"\x2A"; // 1 byte; `42`
assert_eq!(serialize(&e), b);
assert_eq!(e, deserialize(b).unwrap());
Write your own implementation of Serialize
and Deserialize
You may want to write your own implementation of Serialize
and Deserialize
in the following cases:
- You need
impl
for types that are already defined by someone. - You need extreme efficiency.
- Both.
IpAddr is such a type satisfying (3). In the most efficient situation, we want it to be 4 or 16 byte length plus one byte for a tag at any time. This is achieved by giving a hard-written implementation like below.
struct IpAddr(pub std::net::IpAddr);
impl Serialize for IpAddr {
fn serialize(&self, serializer: &mut Serializer) {
match self.0 {
std::net::IpAddr::V4(v4) => {
serializer.serialize_str(&v4.octets()); // 5 bytes
}
std::net::IpAddr::V6(v6) => {
serializer.serialize_str(&v6.octets()); // 17 bytes
}
}
}
}
impl Deserialize for IpAddr {
fn deserialize(deserializer: &mut Deserializer) -> Result<Self, DeserializeError> {
let Str(data) = deserializer.deserialize()?;
let ipaddr = match data.len() {
4 => std::net::IpAddr::V4(std::net::Ipv4Addr::from(
<[u8; 4]>::try_from(data).unwrap(),
)),
16 => std::net::IpAddr::V6(std::net::Ipv6Addr::from(
<[u8; 16]>::try_from(data).unwrap(),
)),
_ => return Err(ValidationError.into()),
};
Ok(Self(ipaddr))
}
}
Appendix: Cheatsheet
schema | Rust | MessagePack (human readable) |
---|---|---|
|
S { x: 42, y: true }
|
{ 0: 42, 1: true }
|
|
S { x: Some(42) }
|
{ 0: 42 }
|
|
S { x: None }
|
{}
|
|
S { x: 42, y: true }
|
[ 42, true ]
|
struct S(u32)
|
S(42)
|
42
|
struct S
|
S
|
UNSUPPORTED |
struct S()
|
S()
|
UNSUPPORTED |
struct S(u32, bool)
|
S(42, true)
|
[ 42, true ]
|
|
E::Foo
|
3
|
|
E::Foo()
|
3
|
|
E::Foo(42)
|
[ 3, 42 ]
|
|
E::Bar(true)
|
true
|
License
Licensed under MIT license.Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in msgpack-schema by you shall be licensed as above, without any additional terms or conditions.
Dependencies
~0.5–1.5MB
~30K SLoC