1 stable release

1.0.0	Jul 27, 2024

#100 in Memory management

Custom license

73KB
1.5K SLoC

alloc_cat

     |\_/|
     )   (
    =\___/=
     /   \       alloc_cat
     |   |
_   (     )  ______   ______   _
 \ / \_ _/\ /      \ /      \ /
  |   //   |        |        |
  |   \\   |        |        |
  |    \)  |        |        |
  |        |        |        |

Alloc_cat is a simple allocator for small-to-tiny Wasm projects in rust, #[no_std] compatible.

It's meant as a substitute for the now-abandoned wee_alloc in situations where the rust system allocator is too heavy-weight. It's not as sophisticated as wee_alloc, focusing instead on simplicity and testability.

The allocation strategy is a simple bump pointer with free list, optimized for programs that rarely use more than 2MB or allocate chunks larger than 32KB, though it can handle any size of heap or allocation up to your system's limit. In addition, metrics can be collected to help analyze your program's memory use.

Quick start

Replace the rust system allocator with the one from alloc_cat. It should be as simple as putting this in your top-level lib.rs:

extern crate alloc;

use alloc_cat::{ALLOCATOR, AllocCat};

#[global_allocator]
pub static GLOBAL_ALLOCATOR: &AllocCat = &ALLOCATOR;

Then all heap allocations (like Box) will use alloc_cat. You can verify at runtime by calling get_stats() (see below) and checking that the values aren't all zero.

Using it

There are three different ways to use alloc_cat:

within Wasm (target_arch = "wasm32")

This mode provides an allocator based on Wasm's memory model (described in a separate section below), and is the main purpose of this library.
wrapping the system allocator

This mode is activated by turning on the system_allocator feature (which is on by default) when running on a non-Wasm architecture. It provides a wrapper for the system allocator with no other intrinsic features except allocation metrics.
using the alloc_cat algorithm outside of Wasm

This mode is activated by turning off the system_allocator default feature when running on a non-Wasm architecture. It grabs a static pool of memory from the OS using mmap and provides the alloc_cat allocator to manage it. It uses mutexes to avoid thread contention, which may be slow. It's really only useful for testing the alloc_cat algorithms in a non-Wasm environment.

Supported cargo "features":

system_allocator (default)

Use the rust system allocator when running on any non-Wasm architecture. This is usually what you want for tests, unless you're testing alloc_cat itself. The alternative is an mmap-based pool, as described above.
audit_sizes

Turning this on will cause make alloc_cat track the count of allocations per size and alignment (in addition to its normal metrics), which makes the allocator a little slower, and uses 2KB of additional memory, but can be useful for debugging leaks. It may also be interesting to the curious or detail-oriented.

When using the mmap allocator (target_arch is not Wasm and system_allocator feature is turned off), you can set the size of the pool used for the heap with an environment variable:

ALLOC_CAT_MMAP_HEAP_SIZE (number, in bytes, default = 2097152 = 2MB)

Metrics

To get basic statistics about the heap:

let stats = ALLOCATOR.get_stats();

The stats object will have various fields filled in, depending on which allocator is in use (unused fields will be zero).

in_use: usize
high_water: usize

in_use is the number of bytes currently allocated by the application, and high_water is the highest value that in_use has ever had. The amount of memory in use by the heap will be larger, due to fragmentation and block size rounding, which are tracked in more detailed metrics below (if you aren't using the system allocator).
size_table: [SizeInfo; 257] (only with feature = "audit_sizes")

Each element in the array is one "bucket" of a scaled histogram of request sizes:
- size: usize -- bytes
- count: usize -- # of current/active allocations
- high_water: usize -- highest value of count
The first 48 buckets are sizes 1 to 48, and the remaining ones cover spans of 2 (50, 52, 54, ...), 4, 8, and up, exponentially, gradually reaching a span of 512 bytes, up to a size of about 22.5KB. The final bucket will cover any allocations larger than 22.5KB, and its size field will be usize::MAX.

Allocation requests are assigned to the nearest bucket greater than or equal to the requested size, so with buckets of (..., 88, 92, 96, ...), the 92 bucket will count all allocation requests of 89, 90, 91, or 92.
align_table: [usize; 16] (only with feature = "audit_sizes")

Each element in the array is one bit-width alignment, from byte alignment (align_table[0] or 0 bits) through u32 alignment (align_table[2], 2 bits) and up. In practice, alignments beyond 3 bits (8 bytes) are rare to nonexistent.

The values in the array are the counts of allocation requests per alignment, as a request count, not the number of current allocations.

The remaining fields are only relevant to the Wasm allocator. They'll all remain 0 for the system allocator. "Words" refer to 32-bit (4-byte) units, and "pages" refer to 64KB units.

regions: usize -- number of 2MB regions being managed by the allocator
total_words: usize -- number of (4-byte) words granted to the allocator by Wasm. This includes memory allocated, tracked in the free list, and not yet allocated by the bump pointer.
allocated_words: usize -- number of words allocated or in the free list
unused_words: usize -- number of words available (so far) to the bump pointer. This is just total_words - allocated_words.
fragment_words: usize -- number of words in the free list, which is unallocated space wedged between active allocations
fragments: usize -- number of separate segments that make up the free list
total_large_pages: usize -- number of 64KB pages allocated for "large" (>= 32KB) requests
unused_large_pages: usize -- number of 64KB large pages that have been freed and not (yet) reused
allocated_large_pages_words: usize -- total number of words requested in all active large allocations
heap_start: usize -- address of the start of the Wasm heap, for the curious. The Wasm heap is described in detail in its own section below.

How Wasm memory works

The Wasm engine provides a contiguous block of memory as a whole number of "pages", where a page is 64KB. Addresses start at 0 (0x0000_0000).

A typical program will reserve memory for global data and a task stack, then leave the rest available for a heap that can grow on demand. As the heap grows, it must ask Wasm for more pages.

    |                           |
    |                           +-- 192KB (3 pages)
    |                           |
    |                           |
    |   ^      not yet      ^   |
    |   |     allocated     |   |
    | - - - - - - - - - - - - - +-- 128KB (2 pages)
    |             ^             |
    |  heap       |             |
    +---------------------------+ (88KB)
    |  memory reserved before   |
    |  program start:           +-- 64KB (1 page)
    |      - task stack         |
    |      - data/bss           |
    |                           |
    |                           |
    +---------------------------+-- 0

In this example, 128KB (or 2 pages) is allocated at startup. 88KB is reserved for global data and a stack, leaving 40KB of the 2nd page for the heap. The program can ask for another page, extending total memory to 192KB and extending the available heap from 40KB to 104KB (40KB + one new 64KB page).

We only get two "system calls" for managing memory:

size: memory_size() -> usize which returns the number of pages currently allocated
grow: memory_grow(count: usize) -> usize which allocates count new pages and returns the previous page count

In our example, memory_size() would return 2 initially. To extend the heap to 104KB, we'd call memory_grow(1) (which would return 2) and now memory_size() will return 3.

There is no way to return memory to the system. Wasm's heap can grow, but never shrink.

We also get a symbol __heap_base which points to the end of reserved memory, at address 88KB (0x0001_6000) here.

Luckily, this is enough to implement a heap. Our memory starts at address __heap_base and ends at address memory_size() * 0x1_0000. Everything within this range is ours, for use in allocating and freeing blocks of memory for the program.

If Wasm runs out of memory, alloc_cat will report failure, which will usually gracelessly end your program in a panic (just like any other allocator).

How alloc_cat works

Normal allocation requests (less than 32KB) use a basic algorithm like you'd find in any textbook since the 20th century, based on a bump pointer and a linked list of free blocks. At the start, the allocator sets the bump pointer to the start of free memory (__heap_base in Wasm), and sets the free list to an empty list.

Each request is rounded up to the nearest number of whole words (4 bytes). First, we walk the free list, and return the first available block which is big enough and has the right alignment. If the block is bigger than the requested size, we'll split it first. If nothing is available, we increment the bump pointer, extending the heap by another page if necessary.

By restricting the block size to 32KB and each region's size to 2MB, each "next" pointer in the free list can pack the size and offset into a single word:

31            24        19    16                 8               0  <- bit
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   size, in words        |   offset to next block, in words      |
|   0-8KW (0-32KB)        |   0-512KW (0-2MB)                     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
free list pointer, packed into one word (32 bits)

The free list is kept sorted by address, so when a block of memory is freed, it can be merged with any adjacent free block. If this causes the last free block to meet the bump pointer, we move the bump pointer backwards instead.

If we want to extend the heap, but it's reached 2MB (or the next Wasm page has already been used for some other purpose, like a large allocation), we start a new heap region and link it to the previous one, creating a linked list of regions, each up to 2MB.

Large allocations (32KB or larger, with no limit) are simpler. They're tracked in a linked list, with each link containing two words: a pointer to the start of the next block, and a count of how many pages are in that next block. This link is stored in the final two words at the end of the last page of each subsequent block. Freed blocks are reused when possible, and can be split and merged if they're next to each other, but only in units of a page (64KB).

License

This code is licensed under either the prosperity license (LICENSE-propserity.md) or the anti-capitalist license (LICENSE-anti-capitalist.txt) at your preference, for personal or non-commercial use.

Authors

Robey Pointer robey@lag.net

The logo is strongly inspired by (based on) a series of pieces by prolific ASCII artist "jgs". You can find cool art of cats, including her work, here.

Dependencies

~2MB
~40K SLoC