2 unstable releases

0.1.0 Mar 20, 2024
0.0.0-alpha.1 Mar 19, 2024

#949 in Encoding

Download history 403/week @ 2024-03-19 67/week @ 2024-03-26 137/week @ 2024-04-02

607 downloads per month
Used in 2 crates

Apache-2.0

89KB
2K SLoC

Serialize and deserialize rust values from the VAA payload wire format.

As of this writing (June, 2022) there is no proper specification for the VAA payload wire format so this implementation has mostly been reverse engineered from the existing messages. While the rest of this document talks about how various types are represented on the wire this should be seen as an explanation of how things are implemented in this crate and not as official documentation. In cases where the serialization of a payload produced by this crate differs from the one use by the wormhole contracts, the serialization used by the actual contract is considered the canonical serialization.

Unless you want to interact with existing wormhole VAA payloads, this crate is probably not what you are looking for. If you are simply using the wormhole bridge to send your own payloads then using a schema with auto-generated code (like protobufs or flatbuffers) is probably a better choice.

Wire format

The VAA payload wire format is not a self-describing format (unlike json and toml). Therefore it is necessary to know the type that needs to be produced before deserializing a byte stream.

The wire format currently supports the following primitive types:

bool

Encoded as a single byte where a value of 0 indicates false and 1 indicates true. All other values are invalid.

Integers

i8, i16, i32, i64, i128, u8, u16, u32, u64, and u128 are all supported and encoded as full-width big-endian integers, i.e., i16 is 2 bytes, u64 is 8 bytes, etc.

char

Encoded as a big-endian u32, with the additional restriction that it must be a valid Unicode Scalar Value.

Sequences

Variable length heterogeneous sequences are encoded as a single byte length followed by the concatenation of the serialized form of each element in the sequence. Note that this means that sequences cannot have more than 255 elements. Additionally, during serialization the length must be known ahead of time.

Byte arrays - &[u8], Vec<u8>, and Cow<'a, [u8]>

Byte arrays are treated as a subset of variable-length sequences and are encoded as a single byte length followed by that many bytes of data. Again, since the length of the byte array has to fit in a single byte it cannot be longer than 255 bytes.

&str, String

String types are encoded the same way as &[u8], with the additional restriction that the byte array must be valid UTF-8.

Tuples

Tuples are heterogenous sequences where the length is fixed and known ahead of time. In this case the length is not encoded on the wire and the serialization of each element in the tuple is concatenated to produce the final value.

Option<T>

The wire format does not support optional values. Options are always deserialized as Some(T) while trying to serialize an Option::None will result in an error.

Structs

Structs are represented the same way as tuples and the wire format for a struct is identical to the wire format for a tuple with the same fields in the same order. The only exception is unit structs (structs with no fields), which are not represented in the wire format at all.

[T; N]

Arrays are treated as tuples with homogenous fields and have the same wire format.

Enums

Enums are encoded as a single byte identifying the variant followed by the serialization of the variant.

  • Unit variants - No additional data is encoded.
  • Newtype variants - Encoded using the serialization of the inner type.
  • Tuple variants - Encoded as a regular tuple.
  • Struct variants - Encoded as a regular struct.

Since the enum variant is encoded as a single byte rather than the name of the variant itself, it is necessary to use #[serde(rename = "<value>")] on each enum variant to ensure that they can be serialized and deserialized properly.

Examples

use std::borrow::Cow;

use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize)]
enum TestEnum<'a> {
    #[serde(rename = "19")]
    Unit,
    #[serde(rename = "235")]
    NewType(u64),
    #[serde(rename = "179")]
    Tuple(u32, u64, Vec<u16>),
    #[serde(rename = "97")]
    Struct {
        #[serde(borrow, with = "serde_bytes")]
        data: Cow<'a, [u8]>,
        footer: u32,
    },
}

assert!(matches!(serde_wormhole::from_slice(&[19]).unwrap(), TestEnum::Unit));

Map types

Map types are encoded as a sequence of (key, value) tuples. The encoding for a Vec<(K, V)> is identical to that of a BTreeMap<K, V>. During serialiazation, the number of elements in the map must be known ahead of time. Like other sequences, the maximum number of elements in the map is 255.

Dependencies

~0.7–1.3MB
~30K SLoC