#http-parser #http #h2 #buffer #generate #h1 #clevercloud

kawa

Agnostic representation of HTTP/1.1 and HTTP/2.0 for parsing, generating and translating HTTP messages, with zero-copy, made for Sōzu

6 releases

0.6.6 Apr 5, 2024
0.6.5 Jan 25, 2024
0.6.4 Nov 15, 2023
0.6.3 Sep 13, 2023
0.6.1 May 4, 2023

#130 in Network programming

Download history 159/week @ 2024-02-26 48/week @ 2024-03-04 223/week @ 2024-03-11 113/week @ 2024-03-18 61/week @ 2024-03-25 257/week @ 2024-04-01 41/week @ 2024-04-08 94/week @ 2024-04-15 81/week @ 2024-04-22 79/week @ 2024-04-29 58/week @ 2024-05-06 27/week @ 2024-05-13 70/week @ 2024-05-20 103/week @ 2024-05-27 131/week @ 2024-06-03 24/week @ 2024-06-10

330 downloads per month
Used in 3 crates (via sozu-lib)

Custom license

115KB
2.5K SLoC

Kawa

Agnostic representation of HTTP/1.1 and HTTP/2.0 for parsing, generating and translating HTTP messages, with zero-copy, made for Sōzu.

Principles

Consider the following HTTP/1.1 response stored in a Buffer:

HTTP/1.1 200 OK
Transfer-Encoding: chunked     // the body of the response is streamed
Connection: Keep-Alive
User-Agent: curl/7.43.0
Trailer: Foo                   // declares a trailer header named "Foo"

4                              // declares one chunk of 4 bytes
Wiki
5                              // declares one chunk of 5 bytes
pedia
0                              // declares one chunk of 0 byte (the last chunk)
Foo: bar                       // trailer header "Foo"

HTTP generic representation

It can be parsed in place, extracting the essential content (header names, values...) and stored as a vector of HTTP generic blocks. Kawa is an intermediary, protocol agnostic, representation of HTTP:

kawa_blocks: [
    StatusLine::Response(V11, Slice("200"), Slice("OK")),
    Header(Slice("Transfer-Encoding"), Slice("chunked")),
    Header(Slice("Connection"), Slice("Keep-Alive")),
    Header(Slice("User-Agent"), Slice("curl/7.43.0")),
    Header(Slice("Trailer"), Slice("Foo")),
    Flags(END_HEADER),
    ChunkHeader(Slice("4")),
    Chunk(Slice("Wiki")),
    Flags(END_CHUNK),
    ChunkHeader(Slice("5")),
    Chunk(Slice("pedia")),
    Flags(END_CHUNK),
    Flags(END_BODY),
    Header(Slice("Foo"), Slice("bar")),
    Flags(END_HEADER | END_STREAM),
]

note: ChunkHeader is the only protocol specific Block. It holds the chunk size present in an HTTP/1.1 chunk header. They can safely be ignored by an HTTP/2 converter. The Flags blocks holds context dependant information, allowing converters to be stateless.

Importantly, Chunk blocks don't necessarily hold an entire chunk. They may only contain a fraction of a bigger chunk. Meaning these two representation are strictly identical:

kawa_full_chunk: [
    ChunkHeader(Slice("4")),
    Chunk(Slice("Wiki")),
    Flags(END_CHUNK),
]
kawa_fragmented_chunk: [
    ChunkHeader(Slice("4")),
    Chunk(Slice("Wi")),
    Chunk(Slice("k")),
    Chunk(Slice("i")),
    Flags(END_CHUNK),
]

note: this is done in order to advance the parsing head without having to wait for potentially very big chunk to arrive entirely. This scheme allows more efficient streaming and prevent the parsers from soft locking on chunks to big to fit in their buffer.

Reference buffer content with Slices

Note that Blocks never copy data. They reference parts of the request using Store::Slices which only holds a start index and a length. The Buffer can be viewed as followed, marking the referenced data in braces:

HTTP/1.1 [200] [OK]
[Transfer-Encoding]: [chunked]
[Connection]: [Keep-Alive]
[User-Agent]: [curl/7.43.0]
[Trailer]: [Foo]

[4]
[Wiki]
[5]
[pedia]
0
[Foo]: [bar]

note: technically everything out of the braces is useless and will never be used

Kawa use cases

Say we want to:

  • remove the "User-Agent" header,
  • add a "Sozu-id" header,
  • change header "Connection" to "close",
  • change trailer "Foo" to "bazz",

All this can be accomplished regardless of the underlying protocol (HTTP/1 or HTTP/2) using the generic Kawa representation:

    kawa_blocks.remove(3); // remove "User-Agent" header
    kawa_blocks.insert(3, Header(Static("Sozu-id"), Vec(sozu_id.as_bytes().to_vec())));
    kawa_blocks[2].val.modify("close");
    kawa_blocks[13].val.modify("bazz");

note: modify should only be used with dynamic values that will be dropped to give then a proper lifetime. For static values (like "close") use a Store::Static instead, this is only for the example. kawa_blocks[2].val = Static("close") would be more efficient.

kawa_blocks: [
    StatusLine::Response(V11, Slice("200"), Slice("OK")),
    Header(Slice("Transfer-Encoding"), Slice("chunked")),
    // "close" is shorter than "Keep-Alive" so it was written in place and kept as a Slice
    Header(Slice("Connection"), Slice("close")),
    Header(Static("Sozu-id"), Vec("SOZUBALANCEID")),
    Header(Slice("Trailer"), Slice("Foo")),
    Flags(END_HEADER),
    ChunkHeader(Slice("4")),
    Chunk(Slice("Wiki")),
    Flags(END_CHUNK),
    ChunkHeader(Slice("5")),
    Chunk(Slice("pedia")),
    Flags(END_CHUNK),
    Flags(END_BODY),
    // "bazz" is longer than "bar" so it was dynamically allocated, this may change in the future
    Header(Slice("Foo"), Vec("bazz"))
    Flags(END_HEADER | END_STREAM),
]

This is what the buffer looks like now:

HTTP/1.1 [200] [OK]
[Transfer-Encoding]: [chunked]
[Connection]: [close]Alive     // "close" written in place and Slice adjusted
User-Agent: curl/7.43.0        // no references to this line
[Trailer]: [Foo]

[4]
[Wiki]
[5]
[pedia]
0
[Foo]: bar                     // no reference to "bar"

Now that the response was successfully edited we can convert it back to a specific protocol. For simplicity's sake, let's convert it back to HTTP/1:

kawa_blocks: [] // Blocks are consumed
out: [
    // StatusLine::Request
    Static("HTTP/1.1"),
    Static(" "),
    Slice("200"),
    Static(" "),
    Slice("OK")
    Static("\r\n"),

    // Header
    Slice("Transfer-Encoding"),
    Static(": "),
    Slice("chunked"),
    Static("\r\n"),

    // Header
    Slice("Connection"),
    Static(": "),
    Slice("close"),
    Static("\r\n"),

    // Header
    Static("Sozu-id"),
    Static(": "),
    Vec("SOZUBALANCEID"),
    Static("\r\n"),

    // Header
    Slice("Trailer"),
    Static(": "),
    Slice("Foo"),
    Static("\r\n"),

    // Flags(END_HEADER)
    Static("\r\n"),

    // ChunkHeader
    Slice("4")
    Static("\r\n")
    // Chunk
    Slice("Wiki")
    // Flags(END_CHUNK)
    Static("\r\n")

    // ChunkHeader
    Slice("5")
    Static("\r\n")
    // Chunk
    Slice("pedia")
    // Flags(END_CHUNK)
    Static("\r\n")

    // Flags(END_BODY),
    Static("0\r\n")

    // Header
    Slice("Foo"),
    Static(": "),
    Vec("bazz"),
    Static("\r\n"),

    // Flags(END_HEADER | END_STREAM)
    Static("\r\n"),
]

Every element holds data as a slice of u8 either static, dynamic or from the response buffer. A vector of IoSlice can be built from this representation and efficiently sent on a socket. This yields the final response:

HTTP/1.1 200 OK
Transfer-Encoding: chunked
Connection: close
Sozu-id: SOZUBALANCEID
Trailer: Foo

4
Wiki
5
pedia
0
Foo: bazz

Memory management

Say the socket only wrote up to "Wi" of "Wikipedia" (109 bytes). After each write, Kawa::consume should be called with the number of bytes written. This signals Kawa to free unecessary Stores from its out vector and reclaim space in its Buffer if possible. In our case, Walking and discarding the Stores from out it remains:

out: [
    // <-- previous Stores were completely written so they were removed
    Slice("ki"),    // Slice was partially written and updated accordingly
    Static("\r\n"),
    Slice("5"),
    Static("\r\n"),
    Slice("pedia"),
    Static("\r\n"),
    Static("0\r\n"),
    Slice("Foo"),
    Static(": "),
    Vec("bazz"),
    Static("\r\n"),
    Static("\r\n"),
]

Most of the data in the request buffer is not referenced anymore, and is useless now:

HTTP/1.1 200 OK
Transfer-Encoding: chunked
Connection: closeAlive
User-Agent: curl/7.43.0
Trailer: Foo

4
Wi[ki]
[5]
[pedia]
0
[Foo]: bar

This can be measured with Kawa::leftmost_ref which returns the start of the leftmost Store::Slice, indicating that everything before that point in the Buffer is unused. Here it would return 115. Buffer::consume will be called with this value. In case the Buffer considers that it should shift its data to free this space (Buffer::should_shift), Buffer::shift is called memmoving the data back to the start of the buffer. The buffer would look like:

ki
5
pedia
0
Foo: bar

note: this is the only instance of copying data in this module and is necessary to not run out of memory unless we change the data structure of Buffer (with a real ring buffer for example). Nevertheless this should be negligeable with most shifts copying 0 or very few bytes.

As a result, the remaining Store::Slices in the out vector reference data that has been moved.

out: [
    Slice("ki"),    // references data starting at index 115
    Static("\r\n"),
    Slice("5"),     // references data starting at index 119
    Static("\r\n"),
    Slice("pedia"), // references...
    Static("\r\n"),
    Static("0\r\n"),
    Slice("Foo"),
    Static(": "),
    Vec("bazz"),
    Static("\r\n"),
    Static("\r\n"),
]

In order to synchronize the Store::Slices with the new buffer, Kawa::push_left is called with the amount of bytes discarded to realigned the data:

out: [
    Slice("ki"),    // references data starting at index 0
    Static("\r\n"),
    Slice("5"),     // references data starting at index 4
    Static("\r\n"),
    Slice("pedia"), // references...
    Static("\r\n"),
    Static("0\r\n"),
    Slice("Foo"),
    Static(": "),
    Vec("bazz"),
    Static("\r\n"),
    Static("\r\n"),
]
[ki]
[5]
[pedia]
0
[Foo]: bar

Dependencies

~1MB
~20K SLoC