9 releases (breaking)
0.8.0 | Sep 5, 2024 |
---|---|
0.7.0 | Sep 5, 2024 |
0.6.0 | Sep 5, 2024 |
0.5.0 | Aug 14, 2024 |
0.1.1 | Feb 25, 2024 |
#393 in Concurrency
28KB
599 lines
folklore
A lock-free concurrent hash map, based on the 'folklore' implementation described in "Concurrent Hash Tables: Fast and General(?)!" by Maier et al.
What?
This has some major limitations compared to a more general hash-map implementation. Namely;
- It cannot be grown past its initial capacity.
- The capacity is limited to
i16::MAX
. - It can only store values which are exactly 2 bytes.
- Removals are not (currently) supported (because of the immense slowdown caused by tombstones filling up the map).
The only benefits are:
- Blazingly fast 🔥 for concurrent access / modification.
- Can be shared safely across threads without requiring
Mutex
,RwLock
etc.
This is kind of just a fun project exploring the implementation of something I read about in an academic paper. I wouldn't really recommend using it.
How?
Map entries are a 16-bit key offset, a 16-bit value, and a 32-bit key hash. This means that any operation on a map entry can be completed with a single 64-bit (1 word) CAS instruction.
The actual map entries store a "key offset" rather than a key, because the keys are allocated in a separate store. The key store is a "ConcurrentArray" which is lock-free and safe for concurrent access, but entries are immutable, and can only be removed if they were the most recently added.
Consistency
Loads and Stores generally use Ordering::Acquire
and Ordering::Release
respectively. Initial lookup for an entry uses Ordering::Relaxed
for performance reasons, so sometimes a newly inserted key might be missed by another thread.
However, that thread will never overwrite the key, because a stronger ordering is used for the actual insertion.
Performance
Some basic benchmarks are included in this repo which compare against std::collections::HashMap
and leapfrog::LeapMap
. There are a set of benchmarks for single-thread, and a set for multi-thread. Here are the numbers I got on an M1 Pro MacBook:
Single-threaded
Map | Time | Throughput |
---|---|---|
std HashMap | 7.9036ms | 49.750 Melem/s |
leapfrog LeapMap | 8.8983ms | 44.189 Melem/s |
folklore HashMap | 4.9738ms | 79.055 Melem/s |
Multi-threaded (8 threads)
Map | Time | Throughput |
---|---|---|
std HashMap (RWLock) | 58.689ms | 53.599 Melem/s |
leapfrog LeapMap | 18.841ms | 166.96 Melem/s |
folklore HashMap | 16.571ms | 189.83 Melem/s |
The numbers of each benchmark are pretty useless on their own, but comparing them we can see that folklore manages to just about beat out leapfrog. Again these benchmarks are very basic, only testing insertion and updating.
Inspired by the ConcurrentMap
implementation in couchbase/fleece.
robclu/leapfrog was instrumental in my understanding how this kinda thing should be written in Rust. The leapfrog map also doesn't suffer from many of the limitations of this map, and is a much better choice in most cases.
Dependencies
~315KB