#shared-state #lock #wait-free #rwlock #thread #read-write

vlock

A fast and scalable multi-version shared state lock with wait-free read access

4 releases

0.2.2 Dec 6, 2024
0.2.1 Mar 21, 2024
0.2.0 Sep 4, 2023
0.1.0 Sep 2, 2023

#516 in Concurrency


Used in 2 crates (via pstream)

MIT/Apache

33KB
320 lines

vlock

This library contains fast and scalable multi-version shared state lock with wait-free read access.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.


lib.rs:

Similar to std::sync::RwLock, yet it locks only on writes and reads are wait-free.

VLock is for read-heavy scenarios. Updates are strictly serialized and may have to wait until readers of older data are finished. It works by keeping multiple versions of the shared data and counting references to each of the versions. Named VLock, because it uses versioning and behaves sort-of like a lock. Naming is hard.

Why bother?

VLock is a fast[^1] and scalable lock with stable read performance for non-copy types at the expense of slower writes.

It's a common pattern among various sorts of systems, where there is a number of processing threads, each making decisions via shared state and some other code that updates that shared state once in a while. In network infrastructure, for example, this can be thought of data and control planes, be it routers or various levels of load balancers. In data-heavy applications, that can be a hot view of some stuff stored in a database.

If that roughly describes your use case, you may benefit from this library at the expense of having 1+ extra copies of data in memory, slightly more state and slower write performance. The former is usually necessary even when RwLock is used to minimize time under lock, and VLock actually reuses old versions which may save you some time by avoiding allocations. The extra state is one usize plus a usize for every version... and there's not much you can do about slower writes.

If read performance is critical and readers can't afford to wait for writes, you may benefit significantly.

As a bonus, the implementation is simple with about 200 lines of actual code and without any external dependencies.

[^1]: Based on synthetic benchmarks on x86_64 laptops, read performance was 1.3-2.0 times faster than ArcSwap, and may be order of magnitude faster than fastest RwLock implementation in certain cases. Writes of VLock are more efficient than ArcSwap. Comparing to RwLock, writes are generally 1 to 10 times slower than parking_lot implementation, but an improvement over the std implementation. With SeqLock results were mixed: in some scenarios reads of VLock was 4 times slower, in some about 1:1 and in other 2 times quicker. Although write performance of VLock is significantly worse than that of SeqLock, it can be used for non-copy types. Note that write performance of VLock may degrade significantly when readers are not progressing and N is small, in other words VLock is susceptible to write starvation by prioritizing reads.

How does it work?

As mentioned above, it uses a combination of versioning and reference counting with a lock to serialize writes.

[VLock<T, N>] has a fixed size N, which is the maximum number of versions to allocate. Version is identified by an offset in the allocated space. State has both the offset and the counter in the same atomic. VLock keeps the current state and a state for every version.

Every time read is called, the current state counter is incremented and the associated offset is used to return the current version to the reader. After the returned pointer is dropped, the per-version state counter is decremented.

During update, a first unused version is picked by checking whether per-version counter is 0, or if nothing is available, a new version is initialized if size allows. This is the place where waiting for readers is happening. Once a free offset is found and it is updated, the current state counter is reset and the new offset is written (which is one atomic operation). The returned counter is then added to the per-version counter associated with that version, after which it will reach 0 once all readers are done with it.

In other words, tracking access can be thought of balancing with two counters - one incrementing in the current state, marking the start of the read access to data, and the other decrementing in the per-version state, marking the end of the access. Then, both are added together. If the resulting counter is greater than 0, there are still some readers, if it's 0, then the version is definitely unused.

Old versions are not dropped, so it acts like a pool. That may be handy when dealing with heap-allocated datastructures that have to be updated or rebuilt, but may also lead to some old garbage lying around in memory.

Usage

use std::sync::Arc;
use std::thread;
use std::time;

use vlock::VLock;

let lock = Arc::new(VLock::<String, 4>::new(String::from("hi there!")));
let lock_clone = Arc::clone(&lock);
let t = thread::spawn(move || {
    for _ in 0..5 {
        println!("{}", *lock_clone.read());
        thread::sleep(time::Duration::from_millis(1));
    }
    lock_clone.update(
        |_, value| {
            value.clear();
            value.push_str("bye!");
        },
        String::new
    );
});
thread::sleep(time::Duration::from_millis(2));
lock.update(
    |_, value| {
        value.clear();
        value.push_str("here's some text for you");
    },
    String::new
);
if let Err(err) = t.join() {
    println!("thread has failed: {err:?}");
}
assert_eq!(*lock.read(), "bye!");

No runtime deps