1 unstable release

0.1.0 Aug 8, 2023

#1589 in Encoding

Custom license

340KB
5K SLoC

Parsing and serialization of (network) packets.

packet is a library to help with the parsing and serialization of nested packets. Network packets are the most common use case, but it supports any packet structure with headers, footers, and nesting.

Model

The core components of packet are the various buffer traits (XxxBuffer and XxxBufferMut). A buffer is a byte buffer with a prefix, a body, and a suffix. The size of the buffer is referred to as its "capacity", and the size of the body is referred to as its "length". Depending on which traits are implemented, the body of the buffer may be able to shrink or grow as allowed by the capacity as packets are parsed or serialized.

Parsing

When parsing packets, the body of the buffer stores the next packet to be parsed. When a packet is parsed from the buffer, any headers, footers, and padding are "consumed" from the buffer. Thus, after a packet has been parsed, the body of the buffer is equal to the body of the packet, and the next call to parse will pick up where the previous call left off, parsing the next encapsulated packet.

Packet objects - the Rust objects which are the result of a successful parsing operation - are advised to simply keep references into the buffer for the header, footer, and body. This avoids any unnecessary copying.

For example, consider the following packet structure, in which a TCP segment is encapsulated in an IPv4 packet, which is encapsulated in an Ethernet frame. In this example, we omit the Ethernet Frame Check Sequence (FCS) footer. If there were any footers, they would be treated the same as headers, except that they would be consumed from the end and working towards the beginning, as opposed to headers, which are consumed from the beginning and working towards the end.

Also note that, in order to satisfy Ethernet's minimum body size requirement, padding is added after the IPv4 packet. The IPv4 packet and padding together are considered the body of the Ethernet frame. If we were to include the Ethernet FCS footer in this example, it would go after the padding.

|-------------------------------------|++++++++++++++++++++|-----| TCP segment
|-----------------|++++++++++++++++++++++++++++++++++++++++|-----| IPv4 packet
|++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| Ethernet frame

|-----------------|-------------------|--------------------|-----|
  Ethernet header      IPv4 header         TCP segment      Padding

At first, the buffer's body would be equal to the bytes of the Ethernet frame (although depending on how the buffer was initialized, it might have extra capacity in addition to the body):

|-------------------------------------|++++++++++++++++++++|-----| TCP segment
|-----------------|++++++++++++++++++++++++++++++++++++++++|-----| IPv4 packet
|++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| Ethernet frame

|-----------------|-------------------|--------------------|-----|
  Ethernet header      IPv4 header         TCP segment      Padding

|----------------------------------------------------------------|
                            Buffer Body

First, the Ethernet frame is parsed. This results in a hypothetical EthernetFrame object (this library does not provide any concrete parsing implementations) with references into the buffer, and updates the body of the buffer to be equal to the body of the Ethernet frame:

|-------------------------------------|++++++++++++++++++++|-----| TCP segment
|-----------------|++++++++++++++++++++++++++++++++++++++++|-----| IPv4 packet
|++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| Ethernet frame

|-----------------|----------------------------------------------|
  Ethernet header                  Ethernet body
         |                                 |
         +--------------------------+      |
                                    |      |
                  EthernetFrame { header, body }

|-----------------|----------------------------------------------|
   buffer prefix                   buffer body

The EthernetFrame object mutably borrows the buffer. So long as it exists, the buffer cannot be used directly (although the EthernetFrame object may be used to access or modify the contents of the buffer). In order to parse the body of the Ethernet frame, we have to drop the EthernetFrame object so that we can call methods on the buffer again. [1]

After dropping the EthernetFrame object, the IPv4 packet is parsed. Recall that the Ethernet body contains both the IPv4 packet and some padding. Since IPv4 packets encode their own length, the IPv4 packet parser is able to detect that some of the bytes it's operating on are padding bytes. It is the parser's responsibility to consume and discard these bytes so that they are not erroneously treated as part of the IPv4 packet's body in subsequent parsings.

This parsing results in a hypothetical Ipv4Packet object with references into the buffer, and updates the body of the buffer to be equal to the body of the IPv4 packet:

|-------------------------------------|++++++++++++++++++++|-----| TCP segment
|-----------------|++++++++++++++++++++++++++++++++++++++++|-----| IPv4 packet
|++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| Ethernet frame

|-----------------|-------------------|--------------------|-----|
                       IPv4 header          IPv4 body
                            |                   |
                            +-----------+       |
                                        |       |
                         Ipv4Packet { header, body }

|-------------------------------------|--------------------|-----|
             buffer prefix                 buffer body       buffer suffix

We can continue this process as long as we like, repeatedly parsing subsequent packet bodies until there are no more packets to parse.

[1] It is also possible to treat the EthernetFrame's body field as a buffer and parse from it directly. However, this has the disadvantage that if parsing is spread across multiple functions, the functions which parse the inner packets only see part of the buffer, and so if they wish to later re-use the buffer for serializing new packets (see the "Serialization" section of this documentation), they are limited to doing so in a smaller buffer, making it more likely that a new buffer will need to be allocated.

Serialization

In this section, we will illustrate serialization using the same packet structure that was used to illustrate parsing - a TCP segment in an IPv4 packet in an Ethernet frame.

Serialization comprises two tasks:

  • First, given a buffer with sufficient capacity, and part of the packet already serialized, serialize the next layer of the packet. For example, given a buffer with a TCP segment already serialized in it, serialize the IPv4 header, resulting in an IPv4 packet containing a TCP segment.
  • Second, given a description of a nested sequence of packets, figure out the constraints that a buffer must satisfy in order to be able to fit the entire sequence, and allocate a buffer which satisfies those constraints. This buffer is then used to serialize one layer at a time, as described in the previous bullet.

Serializing into a buffer

The PacketBuilder trait is implemented by types which are capable of serializing a new layer of a packet into an existing buffer. For example, we might define an Ipv4PacketBuilder type, which describes the source IP address, destination IP address, and any other metadata required to generate the header of an IPv4 packet. Importantly, a PacketBuilder does not define any encapsulated packets. In order to construct a TCP segment in an IPv4 packet, we would need a separate TcpSegmentBuilder to describe the TCP segment.

A PacketBuilder exposes the number of bytes it requires for headers, footers, and minimum and maximum body lengths via the constraints method. It serializes via the serialize method.

In order to serialize a PacketBuilder, a SerializeTarget must first be constructed. A SerializeTarget is a view into a buffer used for serialization, and it is initialized with the proper number of bytes for the header, footer, and body. The number of bytes required for these is discovered through calls to the PacketBuilder's constraints method.

The PacketBuilder's serialize method serializes the headers and footers of the packet into the buffer. It expects that the SerializeTarget is initialized with a body equal to the body which will be encapsulated. For example, imagine that we are trying to serialize a TCP segment in an IPv4 packet in an Ethernet frame, and that, so far, we have only serialized the TCP segment:

|-------------------------------------|++++++++++++++++++++|-----| TCP segment
|-----------------|++++++++++++++++++++++++++++++++++++++++|-----| IPv4 packet
|++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| Ethernet frame

|-------------------------------------|--------------------|-----|
                                            TCP segment

|-------------------------------------|--------------------|-----|
             buffer prefix                 buffer body       buffer suffix

Note that the buffer's body is currently equal to the TCP segment, and the contents of the body are already initialized to the segment's contents.

Given an Ipv4PacketBuilder, we call the appropriate methods to discover that it requires 20 bytes for its header. Thus, we modify the buffer by extending the body by 20 bytes, and constructing a SerializeTarget whose header references the newly-added 20 bytes, and whose body references the old contents of the body, corresponding to the TCP segment.

|-------------------------------------|++++++++++++++++++++|-----| TCP segment
|-----------------|++++++++++++++++++++++++++++++++++++++++|-----| IPv4 packet
|++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| Ethernet frame

|-----------------|-------------------|--------------------|-----|
                       IPv4 header          IPv4 body
                            |                   |
                            +-----------+       |
                                        |       |
                     SerializeTarget { header, body }

|-----------------|----------------------------------------|-----|
   buffer prefix                 buffer body                 buffer suffix

We then pass the SerializeTarget to a call to the Ipv4PacketBuilder's serialize method, and it serializes the IPv4 header in the space provided. When the call to serialize returns, the SerializeTarget and Ipv4PacketBuilder have been discarded, and the buffer's body is now equal to the bytes of the IPv4 packet.

|-------------------------------------|++++++++++++++++++++|-----| TCP segment
|-----------------|++++++++++++++++++++++++++++++++++++++++|-----| IPv4 packet
|++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| Ethernet frame

|-----------------|----------------------------------------|-----|
                                 IPv4 packet

|-----------------|----------------------------------------|-----|
   buffer prefix                 buffer body                 buffer suffix

Now, we are ready to repeat the same process with the Ethernet layer of the packet.

Constructing a buffer for serialization

Now that we know how, given a buffer with a subset of a packet serialized into it, we can serialize the next layer of the packet, we need to figure out how to construct such a buffer in the first place.

The primary challenge here is that we need to be able to commit to what we're going to serialize before we actually serialize it. For example, consider sending a TCP segment to the network. From the perspective of the TCP module of our code, we don't know how large the buffer needs to be because don't know what packet layers our TCP segment will be encapsulated inside of. If the IP layer decides to route our segment over an Ethernet link, then we'll need to have a buffer large enough for a TCP segment in an IPv4 packet in an Ethernet segment. If, on the other hand, the IP layer decides to route our segment through a GRE tunnel, then we'll need to have a buffer large enough for a TCP segment in an IPv4 packet in a GRE packet in an IP packet in an Ethernet segment.

We accomplish this commit-before-serializing via the Serializer trait. A Serializer describes a packet which can be serialized in the future, but which has not yet been serialized. Unlike a PacketBuilder, a Serializer describes all layers of a packet up to a certain point. For example, a Serializer might describe a TCP segment, or it might describe a TCP segment in an IP packet, or it might describe a TCP segment in an IP packet in an Ethernet frame, etc.

Constructing a Serializer

Serializers are recursive - a Serializer combined with a PacketBuilder yields a new Serializer which describes encapsulating the original Serializer in a new packet layer. For example, a Serializer describing a TCP segment combined with an Ipv4PacketBuilder yields a Serializer which describes a TCP segment in an IPv4 packet. Concretely, given a Serializer, s, and a PacketBuilder, b, a new Serializer can be constructed by calling s.encapsulate(b). The Serializer::encapsulate method consumes both the Serializer and the PacketBuilder by value, and returns a new Serializer.

Note that, while Serializers are passed around by value, they are only as large in memory as the PacketBuilders they're constructed from, and those should, in most cases, be quite small. If size is a concern, the PacketBuilder trait can be implemented for a reference type (e.g., &Ipv4PacketBuilder), and references passed around instead of values.

Constructing a buffer from a Serializer

If Serializers are constructed by starting at the innermost packet layer and working outwards, adding packet layers, then in order to turn a Serializer into a buffer, they are consumed by starting at the outermost packet layer and working inwards.

In order to construct a buffer, the Serializer::serialize method is provided. It takes a NestedPacketBuilder, which describes one or more encapsulating packet layers. For example, when serializing a TCP segment in an IP packet in an Ethernet frame, the serialize call on the IP packet Serializer would be given a NestedPacketBuilder describing the Ethernet frame. This call would then compute a new NestedPacketBuilder describing the combined IP packet and Ethernet frame, and would pass this to a call to serialize on the TCP segment Serializer.

When the innermost call to serialize is reached, it is that call's responsibility to produce a buffer which satisfies the constraints passed to it, and to initialize that buffer's body with the contents of its packet. For example, the TCP segment Serializer from the preceding example would need to produce a buffer with 38 bytes of prefix for the IP and Ethernet headers, and whose body was initialized to the bytes of the TCP segment.

We can now see how Serializers and PacketBuilders compose - the buffer returned from a call to serialize satisfies the requirements of the PacketBuilder::serialize method - its body is initialized to the packet to be encapsulated, and enough prefix and suffix space exist to serialize this layer's header and footer. For example, the call to Serializer::serialize on the TCP segment serializer would return a buffer with 38 bytes of prefix and a body initialized to the bytes of the TCP segment. The call to Serializer::serialize on the IP packet would then pass this buffer to a call to PacketBuilder::serialize on its Ipv4PacketBuilder, resulting in a buffer with 18 bytes of prefix and a body initialized to the bytes of the entire IP packet. This buffer would then be suitable to return from the call to Serializer::serialize, allowing the Ethernet layer to continue operating on the buffer, and so on.

Note in particular that, throughout this entire process of constructing Serializers and PacketBuilders and then consuming them, a buffer is only allocated once, and each byte of the packet is only serialized once. No temporary buffers or copying between buffers are required.

Reusing buffers

Another important property of the Serializer trait is that it can be implemented by buffers. Since buffers contain prefixes, bodies, and suffixes, and since the Serializer::serialize method consumes the Serializer by value and returns a buffer by value, a buffer is itself a valid Serializer. When serialize is called, so long as it already satisfies the constraints requested, it can simply return itself by value. If the constraints are not satisfied, it may need to produce a different buffer through some user-defined mechanism (see the BufferProvider trait for details).

This allows existing buffers to be reused in many cases. For example, consider receiving a packet in a buffer, and then responding to that packet with a new packet. The buffer that the original packet was stored in can be used to serialize the new packet, avoiding any unnecessary allocation.

Dependencies

~1MB
~13K SLoC