#slice #stack #memory-management


A fast allocator for short-lived slices and large values

8 releases

0.3.5 Jul 14, 2022
0.3.4 Jun 27, 2022
0.2.1 Sep 21, 2019
0.2.0 Jun 16, 2019

#59 in Memory management

Download history 14/week @ 2022-11-28 7/week @ 2022-12-05 10/week @ 2022-12-12 15/week @ 2022-12-19 9/week @ 2022-12-26 23/week @ 2023-01-02 9/week @ 2023-01-09 7/week @ 2023-01-16 15/week @ 2023-01-23 24/week @ 2023-01-30 20/week @ 2023-02-06 31/week @ 2023-02-13 30/week @ 2023-02-20 31/week @ 2023-02-27 7/week @ 2023-03-06 3/week @ 2023-03-13

83 downloads per month
Used in soa-vec

MIT license

237 lines

The thread's stack is a high performance way to manage memory. But, it cannot be used for large or dynamically sized allocations. What if the thread had a second stack suitable for that purpose?

We've had one, yes. What about second stack? ...Pippin, probably.

second-stack is an allocator for short-lived, potentially large values and slices. It is often faster to use than Vec for the same reason using the thread's stack is faster than using the heap most of the time.

The internal representation is a thread local stack that grows as necessary. Once the capacity saturates, the same allocation will be re-used for many consumers, making it more efficient as more libraries adopt it.

second-stack was originally developed for writing dynamic buffers in WebGL (eg: procedurally generate some triangles/colors, write them to a buffer, and hand them off to the graphics card many times per frame without incurring the cost of many heap allocations). But, over time I found that needing a short-lived slice was common and using second-stack all over the place allowed for the best memory re-use and performance.

There are two ways to use this API. The preferred way is to use methods which delegate to a shared thread local (like buffer, and uninit_slice. Using these methods ensures that multiple libraries efficiently re-use allocations without passing around context and exposing this implementation detail in their public API. Alternatively, you can use Stack::new() to create your own managed stack if you need more control.

Example using buffer:

// Buffer fully consumes an iterator,
// writes each item to a slice on the second stack,
// and gives you mutable access to the slice.
// This API supports Drop.
buffer(0..1000, |items| {
    assert_eq!(items.len(), 1000);
    assert_eq!(items[19], 19);

Example using uninit_slice:

uninit_slice(100, |slice| {
    // Write to the 100 element slice here

Example using Stack:

let stack = Stack::new();
stack.buffer(std::iter::repeat(5).take(100), |slice| {
    // Same as second_stack::buffer, but uses an
    // owned stack instead of the threadlocal one.
    // Not recommended unless you have a specific reason
    // because this limits passive sharing.

Example placing a huge value:

struct Huge {
    bytes: [u8; 4194304]

uninit::<Huge>(|huge| {
    // Do something with this very large
    // value that would cause a stack overflow if
    // we had used the thread stack


How is this different from a bump allocator like bumpalo?

Bump allocators like bumpalo are arena allocators designed for phase-oriented allocations, whereas second-stack is a stack.

This allows second-stack to:

  • Support Drop
  • Dynamically up-size the allocation as needed rather than requiring the size be known up-front
  • Free and re-use memory earlier
  • Conveniently support "large local variables", which does not require architecting the program to fit the arena model

No runtime deps