1 unstable release
0.1.0 | Aug 8, 2023 |
---|
#1589 in Encoding
340KB
5K
SLoC
Parsing and serialization of (network) packets.
packet
is a library to help with the parsing and serialization of nested
packets. Network packets are the most common use case, but it supports any
packet structure with headers, footers, and nesting.
Model
The core components of packet
are the various buffer traits (XxxBuffer
and XxxBufferMut
). A buffer is a byte buffer with a prefix, a body, and a
suffix. The size of the buffer is referred to as its "capacity", and the
size of the body is referred to as its "length". Depending on which traits
are implemented, the body of the buffer may be able to shrink or grow as
allowed by the capacity as packets are parsed or serialized.
Parsing
When parsing packets, the body of the buffer stores the next packet to be
parsed. When a packet is parsed from the buffer, any headers, footers, and
padding are "consumed" from the buffer. Thus, after a packet has been
parsed, the body of the buffer is equal to the body of the packet, and the
next call to parse
will pick up where the previous call left off, parsing
the next encapsulated packet.
Packet objects - the Rust objects which are the result of a successful parsing operation - are advised to simply keep references into the buffer for the header, footer, and body. This avoids any unnecessary copying.
For example, consider the following packet structure, in which a TCP segment is encapsulated in an IPv4 packet, which is encapsulated in an Ethernet frame. In this example, we omit the Ethernet Frame Check Sequence (FCS) footer. If there were any footers, they would be treated the same as headers, except that they would be consumed from the end and working towards the beginning, as opposed to headers, which are consumed from the beginning and working towards the end.
Also note that, in order to satisfy Ethernet's minimum body size requirement, padding is added after the IPv4 packet. The IPv4 packet and padding together are considered the body of the Ethernet frame. If we were to include the Ethernet FCS footer in this example, it would go after the padding.
|-------------------------------------|++++++++++++++++++++|-----| TCP segment
|-----------------|++++++++++++++++++++++++++++++++++++++++|-----| IPv4 packet
|++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| Ethernet frame
|-----------------|-------------------|--------------------|-----|
Ethernet header IPv4 header TCP segment Padding
At first, the buffer's body would be equal to the bytes of the Ethernet frame (although depending on how the buffer was initialized, it might have extra capacity in addition to the body):
|-------------------------------------|++++++++++++++++++++|-----| TCP segment
|-----------------|++++++++++++++++++++++++++++++++++++++++|-----| IPv4 packet
|++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| Ethernet frame
|-----------------|-------------------|--------------------|-----|
Ethernet header IPv4 header TCP segment Padding
|----------------------------------------------------------------|
Buffer Body
First, the Ethernet frame is parsed. This results in a hypothetical
EthernetFrame
object (this library does not provide any concrete parsing
implementations) with references into the buffer, and updates the body of
the buffer to be equal to the body of the Ethernet frame:
|-------------------------------------|++++++++++++++++++++|-----| TCP segment
|-----------------|++++++++++++++++++++++++++++++++++++++++|-----| IPv4 packet
|++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| Ethernet frame
|-----------------|----------------------------------------------|
Ethernet header Ethernet body
| |
+--------------------------+ |
| |
EthernetFrame { header, body }
|-----------------|----------------------------------------------|
buffer prefix buffer body
The EthernetFrame
object mutably borrows the buffer. So long as it exists,
the buffer cannot be used directly (although the EthernetFrame
object may
be used to access or modify the contents of the buffer). In order to parse
the body of the Ethernet frame, we have to drop the EthernetFrame
object
so that we can call methods on the buffer again. [1]
After dropping the EthernetFrame
object, the IPv4 packet is parsed. Recall
that the Ethernet body contains both the IPv4 packet and some padding. Since
IPv4 packets encode their own length, the IPv4 packet parser is able to
detect that some of the bytes it's operating on are padding bytes. It is the
parser's responsibility to consume and discard these bytes so that they are
not erroneously treated as part of the IPv4 packet's body in subsequent
parsings.
This parsing results in a hypothetical Ipv4Packet
object with references
into the buffer, and updates the body of the buffer to be equal to the body
of the IPv4 packet:
|-------------------------------------|++++++++++++++++++++|-----| TCP segment
|-----------------|++++++++++++++++++++++++++++++++++++++++|-----| IPv4 packet
|++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| Ethernet frame
|-----------------|-------------------|--------------------|-----|
IPv4 header IPv4 body
| |
+-----------+ |
| |
Ipv4Packet { header, body }
|-------------------------------------|--------------------|-----|
buffer prefix buffer body buffer suffix
We can continue this process as long as we like, repeatedly parsing subsequent packet bodies until there are no more packets to parse.
[1] It is also possible to treat the EthernetFrame
's body
field as a
buffer and parse from it directly. However, this has the disadvantage that
if parsing is spread across multiple functions, the functions which parse
the inner packets only see part of the buffer, and so if they wish to later
re-use the buffer for serializing new packets (see the "Serialization"
section of this documentation), they are limited to doing so in a smaller
buffer, making it more likely that a new buffer will need to be allocated.
Serialization
In this section, we will illustrate serialization using the same packet structure that was used to illustrate parsing - a TCP segment in an IPv4 packet in an Ethernet frame.
Serialization comprises two tasks:
- First, given a buffer with sufficient capacity, and part of the packet already serialized, serialize the next layer of the packet. For example, given a buffer with a TCP segment already serialized in it, serialize the IPv4 header, resulting in an IPv4 packet containing a TCP segment.
- Second, given a description of a nested sequence of packets, figure out the constraints that a buffer must satisfy in order to be able to fit the entire sequence, and allocate a buffer which satisfies those constraints. This buffer is then used to serialize one layer at a time, as described in the previous bullet.
Serializing into a buffer
The PacketBuilder
trait is implemented by types which are capable of
serializing a new layer of a packet into an existing buffer. For example, we
might define an Ipv4PacketBuilder
type, which describes the source IP
address, destination IP address, and any other metadata required to generate
the header of an IPv4 packet. Importantly, a PacketBuilder
does not
define any encapsulated packets. In order to construct a TCP segment in an
IPv4 packet, we would need a separate TcpSegmentBuilder
to describe the
TCP segment.
A PacketBuilder
exposes the number of bytes it requires for headers,
footers, and minimum and maximum body lengths via the constraints
method.
It serializes via the serialize
method.
In order to serialize a PacketBuilder
, a SerializeTarget
must first be
constructed. A SerializeTarget
is a view into a buffer used for
serialization, and it is initialized with the proper number of bytes for the
header, footer, and body. The number of bytes required for these is
discovered through calls to the PacketBuilder
's constraints
method.
The PacketBuilder
's serialize
method serializes the headers and footers
of the packet into the buffer. It expects that the SerializeTarget
is
initialized with a body equal to the body which will be encapsulated. For
example, imagine that we are trying to serialize a TCP segment in an IPv4
packet in an Ethernet frame, and that, so far, we have only serialized the
TCP segment:
|-------------------------------------|++++++++++++++++++++|-----| TCP segment
|-----------------|++++++++++++++++++++++++++++++++++++++++|-----| IPv4 packet
|++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| Ethernet frame
|-------------------------------------|--------------------|-----|
TCP segment
|-------------------------------------|--------------------|-----|
buffer prefix buffer body buffer suffix
Note that the buffer's body is currently equal to the TCP segment, and the contents of the body are already initialized to the segment's contents.
Given an Ipv4PacketBuilder
, we call the appropriate methods to discover
that it requires 20 bytes for its header. Thus, we modify the buffer by
extending the body by 20 bytes, and constructing a SerializeTarget
whose
header references the newly-added 20 bytes, and whose body references the
old contents of the body, corresponding to the TCP segment.
|-------------------------------------|++++++++++++++++++++|-----| TCP segment
|-----------------|++++++++++++++++++++++++++++++++++++++++|-----| IPv4 packet
|++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| Ethernet frame
|-----------------|-------------------|--------------------|-----|
IPv4 header IPv4 body
| |
+-----------+ |
| |
SerializeTarget { header, body }
|-----------------|----------------------------------------|-----|
buffer prefix buffer body buffer suffix
We then pass the SerializeTarget
to a call to the Ipv4PacketBuilder
's
serialize
method, and it serializes the IPv4 header in the space provided.
When the call to serialize
returns, the SerializeTarget
and
Ipv4PacketBuilder
have been discarded, and the buffer's body is now equal
to the bytes of the IPv4 packet.
|-------------------------------------|++++++++++++++++++++|-----| TCP segment
|-----------------|++++++++++++++++++++++++++++++++++++++++|-----| IPv4 packet
|++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| Ethernet frame
|-----------------|----------------------------------------|-----|
IPv4 packet
|-----------------|----------------------------------------|-----|
buffer prefix buffer body buffer suffix
Now, we are ready to repeat the same process with the Ethernet layer of the packet.
Constructing a buffer for serialization
Now that we know how, given a buffer with a subset of a packet serialized into it, we can serialize the next layer of the packet, we need to figure out how to construct such a buffer in the first place.
The primary challenge here is that we need to be able to commit to what we're going to serialize before we actually serialize it. For example, consider sending a TCP segment to the network. From the perspective of the TCP module of our code, we don't know how large the buffer needs to be because don't know what packet layers our TCP segment will be encapsulated inside of. If the IP layer decides to route our segment over an Ethernet link, then we'll need to have a buffer large enough for a TCP segment in an IPv4 packet in an Ethernet segment. If, on the other hand, the IP layer decides to route our segment through a GRE tunnel, then we'll need to have a buffer large enough for a TCP segment in an IPv4 packet in a GRE packet in an IP packet in an Ethernet segment.
We accomplish this commit-before-serializing via the Serializer
trait. A
Serializer
describes a packet which can be serialized in the future, but
which has not yet been serialized. Unlike a PacketBuilder
, a Serializer
describes all layers of a packet up to a certain point. For example, a
Serializer
might describe a TCP segment, or it might describe a TCP
segment in an IP packet, or it might describe a TCP segment in an IP packet
in an Ethernet frame, etc.
Constructing a Serializer
Serializer
s are recursive - a Serializer
combined with a PacketBuilder
yields a new Serializer
which describes encapsulating the original
Serializer
in a new packet layer. For example, a Serializer
describing a
TCP segment combined with an Ipv4PacketBuilder
yields a Serializer
which
describes a TCP segment in an IPv4 packet. Concretely, given a Serializer
,
s
, and a PacketBuilder
, b
, a new Serializer
can be constructed by
calling s.encapsulate(b)
. The Serializer::encapsulate
method consumes
both the Serializer
and the PacketBuilder
by value, and returns a new
Serializer
.
Note that, while Serializer
s are passed around by value, they are only as
large in memory as the PacketBuilder
s they're constructed from, and those
should, in most cases, be quite small. If size is a concern, the
PacketBuilder
trait can be implemented for a reference type (e.g.,
&Ipv4PacketBuilder
), and references passed around instead of values.
Constructing a buffer from a Serializer
If Serializer
s are constructed by starting at the innermost packet layer
and working outwards, adding packet layers, then in order to turn a
Serializer
into a buffer, they are consumed by starting at the outermost
packet layer and working inwards.
In order to construct a buffer, the Serializer::serialize
method is
provided. It takes a NestedPacketBuilder
, which describes one or more
encapsulating packet layers. For example, when serializing a TCP segment in
an IP packet in an Ethernet frame, the serialize
call on the IP packet
Serializer
would be given a NestedPacketBuilder
describing the Ethernet
frame. This call would then compute a new NestedPacketBuilder
describing
the combined IP packet and Ethernet frame, and would pass this to a call to
serialize
on the TCP segment Serializer
.
When the innermost call to serialize
is reached, it is that call's
responsibility to produce a buffer which satisfies the constraints passed to
it, and to initialize that buffer's body with the contents of its packet.
For example, the TCP segment Serializer
from the preceding example would
need to produce a buffer with 38 bytes of prefix for the IP and Ethernet
headers, and whose body was initialized to the bytes of the TCP segment.
We can now see how Serializer
s and PacketBuilder
s compose - the buffer
returned from a call to serialize
satisfies the requirements of the
PacketBuilder::serialize
method - its body is initialized to the packet to
be encapsulated, and enough prefix and suffix space exist to serialize this
layer's header and footer. For example, the call to Serializer::serialize
on the TCP segment serializer would return a buffer with 38 bytes of prefix
and a body initialized to the bytes of the TCP segment. The call to
Serializer::serialize
on the IP packet would then pass this buffer to a
call to PacketBuilder::serialize
on its Ipv4PacketBuilder
, resulting in
a buffer with 18 bytes of prefix and a body initialized to the bytes of the
entire IP packet. This buffer would then be suitable to return from the call
to Serializer::serialize
, allowing the Ethernet layer to continue
operating on the buffer, and so on.
Note in particular that, throughout this entire process of constructing
Serializer
s and PacketBuilder
s and then consuming them, a buffer is only
allocated once, and each byte of the packet is only serialized once. No
temporary buffers or copying between buffers are required.
Reusing buffers
Another important property of the Serializer
trait is that it can be
implemented by buffers. Since buffers contain prefixes, bodies, and
suffixes, and since the Serializer::serialize
method consumes the
Serializer
by value and returns a buffer by value, a buffer is itself a
valid Serializer
. When serialize
is called, so long as it already
satisfies the constraints requested, it can simply return itself by value.
If the constraints are not satisfied, it may need to produce a different
buffer through some user-defined mechanism (see the BufferProvider
trait
for details).
This allows existing buffers to be reused in many cases. For example, consider receiving a packet in a buffer, and then responding to that packet with a new packet. The buffer that the original packet was stored in can be used to serialize the new packet, avoiding any unnecessary allocation.
Dependencies
~1MB
~13K SLoC