#cache #performance #data-structures #service

k-cache

internal cache implementation for rmemstore

3 releases

0.1.5 Nov 27, 2024
0.1.4 Nov 27, 2024
0.1.1 Aug 12, 2024

#955 in Web programming

Download history 189/week @ 2024-09-02 81/week @ 2024-09-09 132/week @ 2024-09-16 161/week @ 2024-09-23 90/week @ 2024-09-30 93/week @ 2024-10-07 14/week @ 2024-10-14 87/week @ 2024-10-21 175/week @ 2024-10-28 127/week @ 2024-11-04 70/week @ 2024-11-11 152/week @ 2024-11-18 308/week @ 2024-11-25 8/week @ 2024-12-02 87/week @ 2024-12-09 132/week @ 2024-12-16

539 downloads per month

Apache-2.0

13KB
171 lines

rmemstore

Fast, type-aware data structure cache.

About

rmemstore is similar to other caches you may have used, like redis, but it has some differences. The primary aims of rmemstore is to be typesafe, fast, and useful as a data structure cache.

Of course, usefulness is an ongoing exercise, as it takes time to grow features. However, rmemstore is a type- aware data structure store, which means you can store maps of maps - and the server knows what that means.

It is fast now, however. rmemstore uses the new Sieve eviction strategy when pressed to eviction. With a 10:1 read:write ratio, 2 threads on an 11 year old Intel i5 server are capable of over 3.3 million operations per second. Even while being pushed to eviction.

rmemstore is built on "safe" Rust code. It doesn't rely on subtle tricks to get speed. It does use standard libraries like the excellent tokio which may use dark magic, but they're trustworthy.

rmemstore uses bare tcp - no application frameworks. Each 0 and every 1 that your network card transmits to or from an rmemstored server has a direct purpose. Inventing a new ostensibly-portable wire protocol is a vaguely hubric exercise when suitable alternatives exist. With that in mind, rmemstore uses protosockets, which is a compromise between the aforementioned hubris and pragmatism.

Protocol

The tcp stream inbound to rmemstored is a stream of standard, length-delimited protocol buffers rmemstore.Rpc structures. These messages carry an id, and rmemstored responds with that id - possibly out of order. It is a multithreaded, multiplexing server. You can send as much as you want as fast as you can, subject to your network and cpu capabilities.

The tcp stream outbound from rmemstored is a stream of standard, length-delimited protocol buffers rmemstore.Response structures. These messages carry the id from the Rpc that initiated the response. Every rmemstore.Rpc has a corresponding rmemstore.Response.

Inbound and outbound streams are: varint message varint message[...]. The varint before the message is the length of the message. So once you have read the bytes for varint and the length of varint, you have a complete message.

Languages

Rust

You can look at rmem for an example of how you can use the client. Usage boils down to 3 lines:

let mut configuration = rmemstore::ClientConfiguration::new();
let client = configuration.connect(args.host.to_string()).await?;
client.put("some key", "some value").await?;

You can also put dictionaries:

client.put(
    "some key",
    HashMap::<&str, &str>::from_iter([
        ("hello", "world")
    ]),
).await?;

or dictionaries of strings and dictionaries, however wild you want to get:

client
    .put(
        "some key",
        HashMap::<&str, MemstoreValue>::from_iter([
            (
                "hello",
                MemstoreValue::String {
                    string: "world".to_string(),
                },
            ),
            (
                "nested",
                MemstoreValue::Map {
                    map: HashMap::from_iter([(
                        "inner".to_string(),
                        MemstoreValue::String {
                            string: "values".to_string(),
                        },
                    )]),
                },
            ),
        ]),
    )
    .await?;

Bash

You can use rms to put and get.

For strings, the output is a little more brief.

$ rms put foo `{"string": "some value"}`
$ rms get foo
some value

For maps, the interaction has some verbosity, but it is typed!

$ rms put foo '{"map": {"bar":{"map":{"baz":{"string": "haha"}, "other": {"string": "verbose"}}, "outer": {"string": "another"}}}}'
$ rms get foo
{
  "bar": {
    "map": {
      "baz": {
        "string": "haha"
      },
      "other": {
        "string": "verbose"
      }
    }
  }
}

Python

Don't want to use rust? Any tool or language capable of sending and receiving protocol buffers-encoded bytes over tcp is capable of using rmemstored. See example-python for an example in another language. Note that python, in particular, is a bit of a pain due to not exposing the protobuf varint encoder.

Comparisons

k-cache internal cache implementation

Rather than using the popular moka cache, rmemstore has its own cache implementation. Here's an example result from the benchmarks that motivates this deviation: benchmark data showing 2.2-5x better latency for k-cache You can see that the benchmark under eviction favors k-cache at all thread counts. Note that sieve pays on insert, so this 100% insert benchmark is pessimistic, and get will outperform by a wider margin.

Dependencies

~0–10MB
~45K SLoC