6 releases
new 0.3.2 | Jul 19, 2025 |
---|---|
0.3.1 | Jul 18, 2025 |
0.2.0 | May 23, 2025 |
0.1.1 | May 18, 2025 |
#1259 in Network programming
175KB
6K
SLoC
pajamax
Super fast gRPC server framework in synchronous mode.
I used and benchmarked the tonic
in a network server project.
Surprisingly, I found that its performance is not as good as I expected,
with most of the cost being in the tokio
asynchronous runtime and HTTP/2
protocol parsing. So I want to implement a higher-performance gRPC service
framework by solving the above two problems.
Optimization: Synchronous
Asynchronous programming is very suitable for network applications. I love it,
but not here. tokio
is fast but not zero-cost. For some gRPC servers in
certain scenarios, synchronous programming may be more appropriate:
-
Some business logic operates synchronously, allowing it to respond to requests immediately. Consequently, concurrent requests essentially line up in a pipeline here.
-
gRPC utilizes HTTP/2, which supports multiplexing. This means that even though each client can make multiple concurrent requests, only a single connection to the server is established. For internal services served behind a fixed number of gateway machines, the number of connections they handle remains limited and relatively small.
In this case, the more straightforward thread model may be more suitable
than asynchronous model.
Spawn a thread for each connection. The code is synchronous inside each thread.
It receives requests and responds immediately, without employing async
code
or any tokio
components. Since the connections are very stable, there
is even no need to use a thread pool.
Optimization: Deep into HTTP/2
gRPC runs over HTTP/2. gRPC and HTTP/2 are independent layers, and they SHOULD
also be independent in implementation, such as tonic
and h2
are two separate
crates. However, this independence also leads to performance waste, mainly
in the processing of request headers.
-
Typically, a standard HTTP/2 implementation must parse all request headers and return them to the upper-level application. But in a gRPC service, at least in specific scenarios, only the
:path
header is needed, while other headers can be ignored. -
Even for the
:path
header, due to HPACK encoding, it needs allocate memory for an ownedString
before returning to the upper level to process. But in the specific scenario of gRPC, we can process directly on parsing:path
in HTTP/2, thereby avoiding the memory allocation.
To this end, we can implement an HTTP/2 library specifically designed for gRPC. While "reducing coupling" is a golden rule in programming, there are exceptional cases where it can be strategically overlooked for specific purposes.
Benchmark
The above two optimizations eliminate the cost of the asynchronous runtime and reduce the cost of HTTP/2 protocol parsing, resulting in a significant performance improvement.
We measured that Pajamax is up to 10X faster than Tonic using grpc-bench
project. See the
result
for details.
Conclusion
Scenario limitations:
- Synchronous business logical (but see the Dispatch mode below);
- Deployed in internal environment, behind the gateway and not directly exposed to the outside.
Benefits:
- 10X performance improvement at most;
- No asynchronous programming;
- Less dependencies, less compilation time, less executable size.
Loss:
- No gRPC Streaming mode, but only Unary mode;
- No gRPC headers, such as
grpc-timeout
; - No
tower
's ecosystem of middleware, services, and utilities, compared totonic
; - maybe something else.
It's like pajamas, super comfortable and convenient to wear, but only suitable at home, not for going out in public.
Modes: Local and Dispatch
The business logic code discussed above is all synchronous. There is only one thread for each connection. We call it Local mode. The architecture is very simple, as shown in the figure below.
/-----------------------\
( TCP connection )
\--^-----------------+--/
| |
|send |recv
+=====+=================V=====+
| |
| application codes |
| |
+===========pajamax framework=+
We also support another mode, Dispatch mode. This involves multiple threads:
- one input thread, which receives TCP data, parses requests, and dispatches them to the specified backend threads according to user definitions;
- the backend threads are managed by the user themselves; They handle the requests and generate responses, just like in the Local mode;
- one output thread, which encodes responses and sends the data.
The requests and responses are transfered by channels. The architecture is shown in the figure below.
/-----------------------\
( TCP connection )
\--^-----------------+--/
| |
|send |recv
+======+=====+ +========V=======+
| +----+---+ | | +----------+ |
| | encode | | | | dispatch | |
| +-^----^-+ | | +--+----+--+ |
| | : | | : | |
+===+====:===+ | +---V--+ | |
| : | |handle| | |
| : | +---+--+ | |
| : +=====:====+=====+
| :............: |
+==+======================V==+
| |+
| application codes ||+
| |||
+============================+||
+============================+|
+============================+
Applications can also decide some requests not to be dispatched, which will be handled in the input-thread, just like in the Local mode. But the responses have to be transfered to the output thread to sent. As shown by the dashed line in the figure above.
Applications only need to implement 2 traits to define how to dispatch requests and how to handle requests. You do not need to handle the message transfer or encoding, which will be handled by Pajamax.
See the dict-store example for more details.
Usage
The usage of Pajamax is very similar to that of Tonic.
See pajamax-build
crate document for more detail.
Status
Now Pajamax is still in the development stage. I publish it to get feedback.
Todo list:
- More test;
- Configuration builder;
- Hooks like tower's Layer.
License: MIT
Dependencies
~300–400KB