#stack-memory #stack #safe #byte-buffer #stack-allocation #alloca

no-std stackalloc

Safely allocate and manipulate arbitrarily-sized slices on the stack at runtime

7 stable releases

1.2.1 May 8, 2024
1.2.0 Jun 15, 2022
1.1.2 Apr 28, 2022
1.1.1 Jun 27, 2021
1.0.1 Mar 27, 2021

#84 in Memory management

Download history 31/week @ 2024-07-28 9/week @ 2024-08-04 25/week @ 2024-08-11 24/week @ 2024-08-18 47/week @ 2024-08-25 28/week @ 2024-09-01 79/week @ 2024-09-08 84/week @ 2024-09-15 102/week @ 2024-09-22 127/week @ 2024-09-29 19/week @ 2024-10-06 25/week @ 2024-10-13 47/week @ 2024-10-20 85/week @ 2024-10-27 60/week @ 2024-11-03 78/week @ 2024-11-10

271 downloads per month
Used in 5 crates (3 directly)

MIT license

37KB
559 lines

Safe runtime stack allocations

Provides methods for Rust to access and use runtime stack allocated buffers in a safe way (like C VLAs or the alloca() function.) This is accomplished through a helper function that takes a closure of FnOnce that takes the stack allocated buffer slice as a parameter. The slice is considered to be valid only until this closure returns, at which point the stack is reverted back to the caller of the helper function. If you need a buffer that can be moved, use Vec or statically sized arrays. The memory is allocated on the closure's caller's stack frame, and is deallocated when the caller returns.

This slice will be properly formed with regards to the expectations safe Rust has on slices. However, it is still possible to cause a stack overflow by allocating too much memory, so use this sparingly and never allocate unchecked amounts of stack memory blindly.

Requirements

The crate works on stable or nightly Rust, but a C99-compliant compiler is required to build.

Features

  • no_std - Enabled no_std support on nightly toolchains.

Examples

Allocating a byte buffer on the stack.

fn copy_with_buffer<R: Read, W: Write>(mut from: R, mut to: W, bufsize: usize) -> io::Result<usize>
{
  alloca_zeroed(bufsize, move |buf| -> io::Result<usize> {
   let mut read;
   let mut completed = 0;
   while { read = from.read(&mut buf[..])?; read != 0} {
    to.write_all(&buf[..read])?;
    completed += read;
   }
   Ok(completed)
  })
}

Arbitrary types

Allocating a slice of any type on the stack.

stackalloc(5, "str", |slice: &mut [&str]| {
 assert_eq!(&slice[..], &["str"; 5]);
});

Dropping

The wrapper handles dropping of types that require it.

stackalloc_with(5, || vec![String::from("string"); 10], |slice| {
 assert_eq!(&slice[0][0][..], "string");  
}); // The slice's elements will be dropped here

MaybeUninit

You can get the aligned stack memory directly with no initialisation.

stackalloc_uninit(5, |slice| {
 for s in slice.iter_mut()
 {
   *s = MaybeUninit::new(String::new());
 }
 // SAFETY: We have just initialised all elements of the slice.
 let slice = unsafe { stackalloc::helpers::slice_assume_init_mut(slice) };

 assert_eq!(&slice[..], &vec![String::new(); 5][..]);

 // SAFETY: We have to manually drop the slice in place to ensure its elements are dropped, as `stackalloc_uninit` does not attempt to drop the potentially uninitialised elements.
 unsafe {
   std::ptr::drop_in_place(slice as *mut [String]);
 }
});

How does it work?

Since Rust has no way to manipulate the stack at runtime, we use FFI to call into a function which manipulates its frame to allocate the desired memory there. Then, this funcion calls into a callback with a pointer to this stack allocated memory. Once the callback returns, the FFI function handles resetting the stack pointer to that of its caller.

This is beneficial compared to allocating the memory directly on the frame of the caller in many ways:

  • In all languages that support alloca, it is implemented as a compiler builtin. This is because the compiler must insert instructions to reset the stack pointer at any point that the function returns, it must also keep track of addresses of anything pushed onto the stack after the call to alloca. Doing this manually would inhibit the compiler's ability to manage the stack pointer, and prevent it from being sure of not only the size of the current stack frame, but the layout of it as well. Therefore, it is likely more efficient.
  • Allocating directly on the caller's frame would mean the memory stays allocated until the function returns, not until it falls out of scope. This can be a subtle source of bugs, such as the risk of alloca()ing in a loop causing a stack overflow. Therefore, it is also safer and less of a footgun.
  • Implementing alloc this way would require platform-specific inline asm. Which is not only an unstable feature of Rust, it is incredibly unsafe as the compiler might make assumptions based on the layout of the stack which would then become invalidated. There is currently no way to implement a safe alloca that behaves this way, as such an implementation would require the user of this API to be very aware of where and how she can and can't use it.

Downsides

This comes at the cost of an extra function call that is likely not inlineable (even with LTO and other aggressive optimisations). The tradeoff of a stack allocated buffer over a heap allocated one may not be worth it in general, but this extra indirection could potentially impact performance on a micro scale. If performance is of utmost importance to you, then you should be benchmarking these things anyway. It could be a heap allocated Vec is more performant in your use-case, or a stack allocated slice might be.

Performance

For small (1k or lower) element arrays stackalloc can outperform Vec by about 50% or more. This performance difference decreases are the amount of memory allocated grows.

test tests::bench::stackalloc_of_uninit_bytes_known   ... bench:           3 ns/iter (+/- 0)
test tests::bench::stackalloc_of_uninit_bytes_unknown ... bench:           3 ns/iter (+/- 0)
test tests::bench::stackalloc_of_zeroed_bytes_known   ... bench:          22 ns/iter (+/- 0)
test tests::bench::stackalloc_of_zeroed_bytes_unknown ... bench:          17 ns/iter (+/- 0)
test tests::bench::vec_of_uninit_bytes_known          ... bench:          13 ns/iter (+/- 0)
test tests::bench::vec_of_uninit_bytes_unknown        ... bench:          55 ns/iter (+/- 0)
test tests::bench::vec_of_zeroed_bytes_known          ... bench:          36 ns/iter (+/- 2)
test tests::bench::vec_of_zeroed_bytes_unknown        ... bench:          37 ns/iter (+/- 0)

License

MIT licensed

No runtime deps

~0–300KB