#cache #memoization #memo #memory-caching #memory-management

no-std memo-cache

A small, fixed-size cache with retention management

4 releases (breaking)

new 0.8.0 Oct 16, 2024
0.7.0 Sep 18, 2024
0.6.1 Sep 5, 2024
0.6.0 Sep 5, 2024
0.5.0 Sep 2, 2024

#79 in Caching

Download history 233/week @ 2024-08-30 134/week @ 2024-09-06 108/week @ 2024-09-13 43/week @ 2024-09-20 20/week @ 2024-09-27 2/week @ 2024-10-04

173 downloads per month

Apache-2.0 OR MIT

19KB
192 lines

build status

MemoCache

A small, fixed-size cache with retention management, e.g. for use with memoization.

This library does not use the standard library so it is compatible with #[no_std] crates.

Introduction

Sometimes it can be beneficial to speedup pure function calls by using memoization.

The cache storage required for implementing such a speedup often is an associative container (i.e. a key/value store). Programming language standard libraries provide such containers, often implemented as a hash table or a red-black tree. These implementations are fine for performance, but do not actually cover all cases because of the lack of retention management

Suppose your input data covers the whole space that can be represented by a 64-bit integer. There probably is some (generally non-uniform) distribution with which the input values arrive, but it's possible that over time all possible values pass by. Any cache without retention management will then grow to potentially enormous dimensions in memory which is undesirable.

The cache implemented in this library uses a FIFO-style sequential data storage with fixed size, pre-allocated memory. When the cache is full, the oldest item is evicted.

Example usage

Suppose we have a pure method calculate on a type Process (without memoization):

struct Process { /* ... */ }

impl Process {
    fn calculate(&self, input: u64) -> f64 {
        // ..do expensive calculation on `input`..
    }
}

Each single call to this function results in the resource costs of the calculation. We can add memoization to this function in two different ways:

  • Using MemoCache::get_or_insert_with,
  • Using MemoCache::get and MemoCache::insert.

For each of the following examples: each call to calculate will first check if the input value is already in the cache. If so: use the cached value, otherwise update the cache with a new, calculated value.

The cache is fixed-size, so if it is full, the oldest key/value pair will be evicted, and memory usage is constant.

See the examples/ directory for more example code.

Example A: get_or_insert_with

struct Process {
    cache: MemoCache<u64, f64, 32>,
}

impl Process {
    fn calculate(&mut self, input: u64) -> f64 {
        *self.cache.get_or_insert_with(&input, |&i| /* ..do calculation on `input`.. */)
    }
}

For fallible insert functions, there's get_or_try_insert_with.

Example B: get and insert

struct Process {
    cache: MemoCache<u64, f64, 32>,
}

impl Process {
    fn calculate(&mut self, input: u64) -> f64 {
        if let Some(result) = self.cache.get(&input) {
            *result // Use cached value.
        } else {
            let result = /* ..do calculation on `input`.. */;
            self.cache.insert(input, result);
            result
        }
    }
}

Performance notes

The use of a simple sequential data storage does have performance impact, especially for key lookup. That's why this cache will only be beneficial performance-wise when used with a relatively small size, up to about 128 elements.

The performance is very much input data-dependent. If the cache capacity is greater than or equal to the input data set size, performance is optimal (and MemoCache outperforms a HashTable). However, if the input data set size is greater than the cache size, elements will be purged from the cache leading to cache misses and cache churn. In this scenario, the fixed size of the cache, and/or the retention management aspect of MemoCache must weigh against the loss in performance over a HashTable. Always analyze your input data and perform measurements to select the cache size / type you use.

Run the included benchmarks using criterion by invoking: cargo bench

TODO

  • Investigate potential advanced cache improvements (e.g. start here).
  • Perhaps add cursor motion policies based on estimated input data probability distributions (e.g. in the current implementation an often-seen input value will still be overwritten by cursor movement).

License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.

No runtime deps