#cache #caching #concurrent #tinylfu #async

stretto

Stretto is a high performance thread-safe memory-bound Rust cache

19 unstable releases (3 breaking)

Uses new Rust 2021

new 0.4.4 May 21, 2022
0.4.3 Apr 14, 2022
0.3.3 Feb 27, 2022
0.3.2 Dec 18, 2021
0.1.5 Oct 17, 2021

#19 in Concurrency

Download history 97/week @ 2022-01-29 92/week @ 2022-02-05 42/week @ 2022-02-12 30/week @ 2022-02-19 88/week @ 2022-02-26 112/week @ 2022-03-05 70/week @ 2022-03-12 121/week @ 2022-03-19 185/week @ 2022-03-26 208/week @ 2022-04-02 159/week @ 2022-04-09 200/week @ 2022-04-16 274/week @ 2022-04-23 414/week @ 2022-04-30 352/week @ 2022-05-07 509/week @ 2022-05-14

1,600 downloads per month

MIT/Apache

285KB
6K SLoC

Stretto

Stretto is a pure Rust implementation for https://github.com/dgraph-io/ristretto.

A high performance thread-safe memory-bound Rust cache.

English | 简体中文

github Build codecov

docs.rs crates.io crates.io

license-apache license-mit

Features

  • Internal Mutability - Do not need to use Arc<RwLock<Cache<...>> for concurrent code, you just need Cache<...> or AsyncCache<...>
  • Sync and Async - Stretto support async by tokio and sync by crossbeam.
    • In sync, Cache starts two extra OS level threads. One is policy thread, the other is writing thread.
    • In async, AsyncCache starts two extra green threads. One is policy thread, the other is writing thread.
  • Store policy Stretto only store the value, which means the cache does not store the key.
  • High Hit Ratios - with Dgraph's developers unique admission/eviction policy pairing, Ristretto's performance is best in class.
    • Eviction: SampledLFU - on par with exact LRU and better performance on Search and Database traces.
    • Admission: TinyLFU - extra performance with little memory overhead (12 bits per counter).
  • Fast Throughput - use a variety of techniques for managing contention and the result is excellent throughput.
  • Cost-Based Eviction - any large new item deemed valuable can evict multiple smaller items (cost could be anything).
  • Fully Concurrent - you can use as many threads as you want with little throughput degradation.
  • Metrics - optional performance metrics for throughput, hit ratios, and other stats.
  • Simple API - just figure out your ideal CacheBuilder/AsyncCacheBuilder values and you're off and running.

Table of Contents

Installation

  • Use Cache.
[dependencies]
stretto = "0.4"

or

[dependencies]
stretto = { version = "0.4", features = ["sync"] }
  • Use AsyncCache
[dependencies]
stretto = { version = "0.4", features = ["async"] }
  • Use both Cache and AsyncCache
[dependencies]
stretto = { version = "0.4", features = ["full"] }

Related

If you want some basic caches implementation(no_std), please see https://crates.io/crates/caches.

Usage

Example

Sync

use std::time::Duration;
use stretto::Cache;

fn main() {
    let c = Cache::new(12960, 1e6 as i64).unwrap();

    // set a value with a cost of 1
    c.insert("a", "a", 1);
    // set a value with a cost of 1 and ttl
    c.insert_with_ttl("b", "b", 1, Duration::from_secs(3));

    // wait for value to pass through buffers
    c.wait().unwrap();

    // when we get the value, we will get a ValueRef, which contains a RwLockReadGuard
    // so when we finish use this value, we must release the ValueRef
    let v = c.get(&"a").unwrap();
    assert_eq!(v.value(), &"a");
    v.release();

    // lock will be auto released when out of scope
    {
        // when we get the value, we will get a ValueRef, which contains a RwLockWriteGuard
        // so when we finish use this value, we must release the ValueRefMut
        let mut v = c.get_mut(&"a").unwrap();
        v.write("aa");
        assert_eq!(v.value(), &"aa");
        // release the value
    }

    // if you just want to do one operation
    let v = c.get_mut(&"a").unwrap();
    v.write_once("aaa");

    let v = c.get(&"a").unwrap();
    assert_eq!(v.value(), &"aaa");
    v.release();

    // clear the cache
    c.clear().unwrap();
    // wait all the operations are finished
    c.wait().unwrap();
    assert!(c.get(&"a").is_none());
}

Async

use std::time::Duration;
use stretto::AsyncCache;

#[tokio::main]
async fn main() {
    let c: AsyncCache<&str, &str> = AsyncCache::new(12960, 1e6 as i64).unwrap();

    // set a value with a cost of 1
    c.insert("a", "a", 1).await;

    // set a value with a cost of 1 and ttl
    c.insert_with_ttl("b", "b", 1, Duration::from_secs(3)).await;

    // wait for value to pass through buffers
    c.wait().await.unwrap();

    // when we get the value, we will get a ValueRef, which contains a RwLockReadGuard
    // so when we finish use this value, we must release the ValueRef
    let v = c.get(&"a").unwrap();
    assert_eq!(v.value(), &"a");
    // release the value
    v.release(); // or drop(v)

    // lock will be auto released when out of scope
    {
        // when we get the value, we will get a ValueRef, which contains a RwLockWriteGuard
        // so when we finish use this value, we must release the ValueRefMut
        let mut v = c.get_mut(&"a").unwrap();
        v.write("aa");
        assert_eq!(v.value(), &"aa");
        // release the value
    }

    // if you just want to do one operation
    let v = c.get_mut(&"a").unwrap();
    v.write_once("aaa");

    let v = c.get(&"a").unwrap();
    println!("{}", v);
    assert_eq!(v.value(), &"aaa");
    v.release();

    // clear the cache
    c.clear().unwrap();
    // wait all the operations are finished
    c.wait().await.unwrap();

    assert!(c.get(&"a").is_none());
}

Config

The CacheBuilder struct is used when creating Cache instances if you want to customize the Cache settings.

num_counters

num_counters is the number of 4-bit access counters to keep for admission and eviction. Dgraph's developers have seen good performance in setting this to 10x the number of items you expect to keep in the cache when full.

For example, if you expect each item to have a cost of 1 and max_cost is 100, set num_counters to 1,000. Or, if you use variable cost values but expect the cache to hold around 10,000 items when full, set num_counters to 100,000. The important thing is the number of unique items in the full cache, not necessarily the max_cost value.

max_cost

max_cost is how eviction decisions are made. For example, if max_cost is 100 and a new item with a cost of 1 increases total cache cost to 101, 1 item will be evicted.

max_cost can also be used to denote the max size in bytes. For example, if max_cost is 1,000,000 (1MB) and the cache is full with 1,000 1KB items, a new item (that's accepted) would cause 5 1KB items to be evicted.

max_cost could be anything as long as it matches how you're using the cost values when calling insert.

key_builder

pub trait KeyBuilder<K: Hash + Eq + ?Sized> {
    /// hash_index is used to hash the key to u64
    fn hash_index(&self, key: &K) -> u64;

    /// if you want a 128bit hashes, you should implement this method,
    /// or leave this method return 0
    fn hash_conflict(&self, key: &K) -> u64 { 0 }

    /// build the key to 128bit hashes.
    fn build_key(&self, k: &K) -> (u64, u64) {
        (self.hash_index(k), self.hash_conflict(k))
    }
}

KeyBuilder is the hashing algorithm used for every key. In Stretto, the Cache will never store the real key. The key will be processed by KeyBuilder. Stretto has two default built-in key builder, one is TransparentKeyBuilder, the other is DefaultKeyBuilder. If your key implements TransparentKey trait, you can use TransparentKeyBuilder which is faster than DefaultKeyBuilder. Otherwise, you should use DefaultKeyBuilder You can also write your own key builder for the Cache, by implementing KeyBuilder trait.

Note that if you want 128bit hashes you should use the full (u64, u64), otherwise just fill the u64 at the 0 position, and it will behave like any 64bit hash.

buffer_size

buffer_size is the size of the insert buffers. The Dgraph's developers find that 32 * 1024 gives a good performance.

If for some reason you see insert performance decreasing with lots of contention (you shouldn't), try increasing this value in increments of 32 * 1024. This is a fine-tuning mechanism and you probably won't have to touch this.

metrics

Metrics is true when you want real-time logging of a variety of stats. The reason this is a CacheBuilder flag is because there's a 10% throughput performance overhead.

ignore_internal_cost

Set to true indicates to the cache that the cost of internally storing the value should be ignored. This is useful when the cost passed to set is not using bytes as units. Keep in mind that setting this to true will increase the memory usage.

cleanup_duration

The Cache will cleanup the expired values every 500ms by default.

update_validator

pub trait UpdateValidator<V>: Send + Sync + 'static {
    /// should_update is called when a value already exists in cache and is being updated.
    fn should_update(&self, prev: &V, curr: &V) -> bool;
}

By default, the Cache will always update the value if the value already exists in the cache, this trait is for you to check if the value should be updated.

callback

pub trait CacheCallback<V: Send + Sync>: Send + Sync + 'static {
    /// on_exit is called whenever a value is removed from cache. This can be
    /// used to do manual memory deallocation. Would also be called on eviction
    /// and rejection of the value.
    fn on_exit(&self, val: Option<V>);

    /// on_evict is called for every eviction and passes the hashed key, value,
    /// and cost to the function.
    fn on_evict(&self, item: Item<V>) {
        self.on_exit(item.val)
    }

    /// on_reject is called for every rejection done via the policy.
    fn on_reject(&self, item: Item<V>) {
        self.on_exit(item.val)
    }
}

CacheCallback is for customize some extra operations on values when related event happens.

coster

pub trait Coster<V>: Send + Sync + 'static {
    /// cost evaluates a value and outputs a corresponding cost. This function
    /// is ran after insert is called for a new item or an item update with a cost
    /// param of 0.
    fn cost(&self, val: &V) -> i64;
}

Cost is a trait you can pass to the CacheBuilder in order to evaluate item cost at runtime, and only for the insert calls that aren't dropped (this is useful if calculating item cost is particularly expensive, and you don't want to waste time on items that will be dropped anyways).

To signal to Stretto that you'd like to use this Coster trait:

  1. Set the Coster field to your own Coster implementation.
  2. When calling insert for new items or item updates, use a cost of 0.

hasher

The hasher for the Cache, default is SipHasher.

Acknowledgements

  • Thanks Dgraph's developers for providing amazing Go version Ristretto implementation.

License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Dependencies

~1–7MB
~114K SLoC