#allocator #memory

nightly no-std simple-chunk-allocator

A simple no_std allocator written in Rust that manages memory in fixed-size chunks/blocks. Useful for basic no_std binaries where you want to manage a heap of a few megabytes without complex features such as paging/page table management. Instead, this allocator gets a fixed/static memory region and allocates memory from there. This memory region can be contained inside the executable file that uses this allocator.

5 releases

Uses new Rust 2021

0.1.4 Mar 17, 2022
0.1.3 Mar 7, 2022
0.1.2 Mar 7, 2022
0.1.1 Mar 7, 2022
0.1.0 Mar 7, 2022

#61 in Memory management

Download history 70/week @ 2022-03-04 34/week @ 2022-03-11 32/week @ 2022-03-18 3/week @ 2022-03-25 9/week @ 2022-04-01 5/week @ 2022-04-08 3/week @ 2022-04-15 2/week @ 2022-04-22 41/week @ 2022-04-29 37/week @ 2022-05-06 89/week @ 2022-05-13 19/week @ 2022-05-20 31/week @ 2022-05-27 35/week @ 2022-06-03 2/week @ 2022-06-10 3/week @ 2022-06-17

71 downloads per month

MIT license

75KB
837 lines

Simple Chunk Allocator

A simple no_std allocator written in Rust that manages memory in fixed-size chunks/blocks. Useful for basic no_std binaries where you want to manage a heap of a few megabytes without complex features such as paging/page table management. Instead, this allocator gets a fixed/static memory region and allocates memory from there. This memory region can be contained inside the executable file that uses this allocator. See examples down below.

There probably exist better solutions for large-scale applications that have better performance by using a more complex algorithm. However, this is good for simple no_std binaries and hopefully also for educational purposes. It helped me to understand a lot about allocators.

TL;DR

  • no_std allocator with test coverage
  • ✅ uses static memory as backing storage (no paging/page table manipulations)
  • ✅ allocation strategy is a combination of next-fit and best-fit
  • ✅ reasonable fast with low code complexity
  • ✅ const compatibility (no runtime init() required)
  • ✅ efficient in scenarios where heap is a few dozens megabytes in size
  • ✅ user-friendly API

The inner and low-level ChunkAllocator can be used as #[global_allocator] with the synchronized wrapper type GlobalChunkAllocator. Both can be used with the allocator_api feature. The latter enables the usage in several types of the Rust standard library, such as Vec::new_in or BTreeMap::new_in. This is primarily interesting for testing but may also enable other interesting use-cases.

The focus is on const compatibility. The allocator and the backing memory can get initialized during compile time and need no runtime init() call or similar. This means that if the compiler accepts it then the allocation will also work during runtime. However, you can also create allocator objects during runtime.

The inner and low-level ChunkAllocator is a chunk allocator or also called fixed-size block allocator. It uses a mixture of the strategies next-fit and a best-fit. It tries to use the smallest gap for an allocation request to prevent fragmentation but this is no guarantee. Each allocation is a trade-off between a low allocation time and preventing fragmentation. The default chunk size is 256 bytes but this can be changed as compile time const generic. Having a fixed-size block allocator enables an easy bookkeeping algorithm through a bitmap but has as consequence that small allocations, such as 64 byte will take at least one chunk/block of the chosen block size.

This project originates from my Diplom thesis project. Since I originally had lots of struggles to create this (my first ever allocator), I outsourced it for better testability and to share my knowledge and findings with others in the hope that someone can learn from it in any way.

Minimal Code Example

#![feature(const_mut_refs)]
#![feature(allocator_api)]

use simple_chunk_allocator::{heap, heap_bitmap, GlobalChunkAllocator, PageAligned};

// The macros help to get a correctly sized arrays types.
// I page-align them for better caching and to improve the availability of
// page-aligned addresses.

/// Backing storage for heap (1Mib). (read+write) static memory in final executable.
///
/// heap!: first argument is chunk amount, second argument is size of each chunk.
///        If no arguments are provided it falls back to defaults.
///        Example: `heap!(chunks=16, chunksize=256)`.
static mut HEAP: PageAligned<[u8; 1048576]> = heap!();
/// Backing storage for heap bookkeeping bitmap. (read+write) static memory in final executable.
///
/// heap_bitmap!: first argument is amount of chunks.
///               If no argument is provided it falls back to a default.
///               Example: `heap_bitmap!(chunks=16)`.
static mut HEAP_BITMAP: PageAligned<[u8; 512]> = heap_bitmap!();

// please make sure that the backing memory is at least CHUNK_SIZE aligned; better page-aligned
#[global_allocator]
static ALLOCATOR: GlobalChunkAllocator =
    unsafe { GlobalChunkAllocator::new(HEAP.deref_mut_const(), HEAP_BITMAP.deref_mut_const()) };

fn main() {
    // at this point, the allocator already got used a bit by the Rust runtime that executes
    // before main() gets called. This is not the case if a `no_std` binary gets produced.
    let old_usage = ALLOCATOR.usage();
    let mut vec = Vec::new();
    vec.push(1);
    vec.push(2);
    vec.push(3);
    assert!(ALLOCATOR.usage() > old_usage);

    // use "allocator_api"-feature. You can use this if "ALLOCATOR" is not registered as
    // the global allocator. Otherwise, it is already the default.
    let _boxed = Box::new_in([1, 2, 3], ALLOCATOR.allocator_api_glue());
}

MSRV

This crate only builds with the nightly version. I developed it with version 1.61.0-nightly (2022-03-05).

Performance

The default CHUNK_SIZE is 256 bytes. It is a tradeoff between performance and efficient memory usage.

I executed my example bench in release mode on an Intel i7-1165G7 CPU and a heap of 160MB to get the results listed below. I used RUSTFLAGS="-C target-cpu=native" cargo run --release --example bench to excute the benchmark with maximum performance. The benchmark simulates a heavy usage of the heap in a single-threaded program with many random allocations and deallocations. The benchmark stops when the heap is at 100%. The allocations vary in their alignment. The table below shows the results of this benchmark as number of clock cycles.

Chunk Size # Chunks # allocations # deallocations median average min max
128 1310720 68148 47915 955 1001 126 57989
256 [DEFAULT] 655360 71842 51744 592 619 121 53578
512 327680 66672 46858 373 401 111 54403

The results vary slightly because each run gets influenced by some randomness. One can see that the performance gets slower with a growing number of chunks. Increasing the chunk size reduces the size of the bookkeeping bitmap which accelerates the lookup. However, a smaller chunk size occupies less heap when only very small allocations are required.

Note that performance is better than listed above when the heap is used less frequently and does not run full.

Dependencies

~785KB
~15K SLoC