4 releases (breaking)

0.4.0	Dec 27, 2024
0.3.0	Dec 26, 2024
0.2.0	Dec 24, 2024
0.1.0	Dec 23, 2024

#100 in Memory management

MIT license

60KB
859 lines

RingAl - Fast Ring Allocator for Temporary Buffers

Overview
- Key Use Cases
Design Philosophy
- Guard Mechanics
- Allocation Scenarios
Key Features
Optional Crate Features (Cargo)
Usage Examples
- Extendable Buffer
- Fixed Buffer
- Multi-threaded Environment
- Thread Local Storage
Benchmarks
- Parameter Definitions
- Fixed Preallocated Buffer Results
- Extendable Buffer Results (Chunked Writes)
- Key Insights
- System Specs
Dependencies
Future Plans
Safety

Overview

RingAl is a high-performance ring allocator designed for quick buffer allocations that don't live long. It uses a circular allocation method for fast memory management, which is perfect for short-lived buffers. For best performance, buffers should be freed quickly to avoid slowing things down by jamming the allocator.

Key Use Cases:

Preallocation: Set up a memory pool with a defined size N bytes.
Small Buffer Allocation: Allocate small buffers from the pool, each less than N.
Cross-thread Buffer Usage: Share buffers across threads efficiently using lightweight cloning, similar to Arc.
Timely Deallocation: Release buffers before reusing the same memory, for performance.
Recycling Storage: Freed buffers automatically go back into the pool for reallocation.

Design Philosophy

RingAl focuses on efficient memory management with a dynamic memory pool that evolves its backing store using something called guard sequences. Guard sequences are usually exclusively owned by allocator when they are not locked (i.e. not guarding the allocation), and when they do become locked, the memory region (single usize) backing the guard sequence, is only accessed using Atomic operations. This allows for one thread (the owner of buffer) to release the guard (write to guard sequence), when the buffer is no longer in use and the other thread, where allocator is running, to t perform synchronized check of whether guard sequence have been released.

The design allows for safe multi-threaded access: one thread can write while one other can read. If a buffer is still occupied after allocator wraps around, the allocation attempt just returns None, prompting a retry.

Note: RingAl itself isn't Sync, so it cannot be accessed from multiple threads concurrently. Instead, using thread-local storage is encouraged.

Guard Mechanics:

Guards help manage memory by indicating:

Whether guarded memory is currently in use.
Address of the next guard in the memory pool.

Allocation Scenarios:

When you request memory, RingAl checks the current guard and does one of the following:

Exact Fit: Uses the guard if it matches the request size, marking memory as used.
Oversized Guard: Splits the guard if it's larger than needed, which might fragment memory.
Undersized Guard: Combines adjacent regions until the request is satisfied, which helps defragment the memory.
Insufficient Capacity: Returns None if it can't satisfy the request, even after merging.

   allocator
      |
      v
-----------------------------------------------------------------------------------
| head canary | N bytes | guard1 | L bytes | guard2 | M bytes | ... | tail canary |
-----------------------------------------------------------------------------------
     |                    ^   |              ^   |              ^ |          ^ 
     |                    |   |              |   |              | |          |
     ----------------------   ----------------   ---------------- ------------
     ^                                                                       |
     |                                                                       |
     -------------------------------------------------------------------------

Note: Head and Tail canaries are fixed regular guards, with the exception that they always persist, and Tail guard always points to Head, forming a ring.

Key features

Dynamic Fragmentation Handling: Manages different allocation sizes with adaptive memory handling.
Expandable Buffers: Can grow like Vec<u8>, unless it hits capacity limits.
Fixed-Size Buffers: Efficient, fixed sized buffer, but can be sent across threads.
Read-Only Buffers: Cloneable and shareable across threads. They're efficient but involve extra heap allocation for reference counting, so use them wisely.
Generic Vec<T> like fixed capacity buffers, these can be allocated from the same backing store as regular u8 buffers
Thread local storage allocator: allocator instance can be created for each thread, and accessed from anywhere in the code, removeing the need to pass the allocator around

For more details, check the RingAl Documentation.

Optional Crate Features (Cargo)

tls (Thread-Local Storage): Adds thread-local storage, simplifying allocation calls, but might slow things slightly due to RefCell usage.
drop (Allocator Deallocation): Normally, the allocator lives for the app's duration, but this feature allows early deallocation when needed. It waits for all allocations to finish before freeing memory to prevent leaks, blocks the thread if there're active allocations.

Usage Examples

Extendable Buffer

let mut allocator = RingAl::new(1024); // Create an allocator with initial size
let mut buffer = allocator.extendable(64).unwrap();
let msg = b"hello world, this message is longer than the allocated capacity but will grow as needed.";
let size = buffer.write(msg).unwrap();
let fixed = buffer.finalize();
assert_eq!(fixed.as_ref(), msg);
assert_eq!(fixed.len(), size);

Fixed Buffer

let mut allocator = RingAl::new(1024); // Create an allocator with initial size
let mut buffer = allocator.fixed(256).unwrap();
let size = buffer.write(b"hello world, this message is relatively short").unwrap();
assert_eq!(buffer.len(), size);
assert_eq!(buffer.spare(), 256 - size);

Generic Buffer

let mut allocator = RingAl::new(1024); // Create an allocator with initial size
struct MyType {
    field1: usize,
    field2: u128
}
let mut buffer = allocator.generic::<MyType>(16).unwrap();
buffer.push(MyType { field1: 42, field2: 43 });
assert_eq!(buffer.len(), 1);
let t = buffer.pop().unwrap();
assert_eq!(t.field1, 42);
assert_eq!(t.field2, 43);

Multi-threaded Environment

let mut allocator = RingAl::new(1024); // Create an allocator with initial size
let (tx, rx) = channel();
let mut buffer = allocator.fixed(64).unwrap();
let _ = buffer.write(b"message to another thread").unwrap();
let handle = std::thread::spawn(move || {
    let buffer = rx.recv().unwrap();
    let readonly = buffer.freeze();
    let mut handles = Vec::with_capacity(16);
    for i in 0..16 {
        let (tx, rx) = channel();
        let h = std::thread::spawn(move || {
            let msg = rx.recv().unwrap();
            let msg = std::str::from_utf8(&msg[..]).unwrap();
            println!("{i}. {msg}");
        });
        tx.send(readonly.clone());
        handles.push(h);
    }
    for h in handles {
        h.join();
    }
});
tx.send(buffer);
handle.join();

Thread Local Storage

// init the thread local allocator, should be called just once, any thread
spawned afterwards, will have access to their own instance of the allocator
ringal!(@init, 1024);
// allocate fixed buffer
let mut fixed = ringal!(@fixed, 64).unwrap();
let _ = fixed.write(b"hello world!").unwrap();
// allocate extendable buffer, and pass a callback to operate on it
// this approach is necessary as ExtBuf contains reference to thread local storage, 
// and LocalKey doesn't allow any references to exist outside of access callback 
let fixed = ringal!{@ext, 64, |extendable| {
    let _ = extendable.write(b"hello world!").unwrap();
    // it's ok to return FixedBuf(Mut) from callback
    extendable.finalize()
}};
println!("bytes written: {}", fixed.len());

Benchmarks

Comparisons between RingAl and the system allocator (Vec<u8>) show performance gains in various scenarios.

Parameter Definitions

Iterations: Number of operations performed
Buffer Size: Maximum bytes written per buffer, the actual number is random
Max Buffers: Maximum number of buffers held after allocation

Fixed Preallocated Buffer Results

Iterations	Buffer Size	Max Buffers	ringal (ms)	vec (ms)	Improvement
10,000,000	64	64	696.7	998.0	1.43x
10,000,000	1,024	64	923.4	2,062.0	2.23x
10,000,000	1,024	1,024	913.7	1,622.0	1.77x
1,000,000	65,536	64	932.6	1,718.0	1.84x
1,000,000	131,072	1,024	1,709.0	3,960.0	2.32x

Extendable Buffer Results (Chunked Writes)

Iterations	Buffer Size	Max Buffers	ringal (ms)	vec (ms)	Improvement
10,000,000	64	64	834.0	1,120.0	1.34x
10,000,000	1,024	64	1,164.0	3,228.0	2.77x
10,000,000	1,024	1,024	1,205.0	3,044.0	2.53x
1,000,000	65,536	64	1,958.0	3,646.0	1.86x
1,000,000	131,072	1,024	3,588.0	9,008.0	2.51x

Key Insights

RingAl consistently beats Vec<u8> in all cases.
Gains range from 1.34x to 2.77x.
Bigger buffers show the most improvement.
Both fixed and extendable buffers show significant speedups.

System Specs

Benchmarks done with hyperfine, running each case 10 times. Shown times are averages.

Device: Apple MacBook Air
Processor: Apple M2
Memory: 16 GB RAM

Dependencies

RingAl doesn't rely on any third-party crates, only the Rust standard library.

Future Plans

Support for generic type buffer allocations.

Safety

The implementation uses a lot of unsafe Rust, this is an allocator after all, so raw pointers are inevitable (some parts of code look like they were written in C :)). A lot of effort was put into ensuring that the code is safe from UB and data races, and the API is safe to use. Nonetheless all contributions to making it even safer are always welcome.

ringal