#columnar #container #type #allocations #region #byte #arena

bin+lib columnation

Unsafe columnar containers for native Rust types

1 unstable release

0.1.0 Oct 29, 2024

#337 in Rust patterns

Download history 1544/week @ 2024-10-28 2282/week @ 2024-11-04 2910/week @ 2024-11-11

6,736 downloads per month
Used in 4 crates (via timely_container)

MIT license

43KB
643 lines

Columnation

An experimental columnar arena

Alternatives

Columnation presents unsafe code as if it were safe, which means it has the potential to be unsound. Its public interface has unsafe methods, which means you need to understand Rust's requirements for unsafe code or risk unsoundness yourself. For alternatives with less unsafety (but other trade-offs) consider columnar or flatcontainer.

Overview

Columnation borrows its name from Abomonation, a Rust serialization framework that is very fast and very unsafe. Among Abomonation's unsafeties, it hosts typed data backed by [u8] slices, which causes anxiety for folks who worry about alignment, the visibility of padding bytes, and likely other things I don't yet know about.

Columnation is slightly better, in that it only maintains typed allocations Vec<T>, and does not invoke mem::transmute to change types. This does not mean it is safe, just that there are fewer places in the code that are likely to be unsafe once Rust's unsafety story shakes out.

Instead, Columnation works by consolidating the owned allocations that types might have (e.g. a String, or a Vec<T> for other types T) into relatively fewer and larger allocations, and rewriting pointers in these types to point in to the larger allocations. This makes the types unsuitable for any use other than through an immutable reference. It also means that Columnation is plenty unsafe, and just an experiment for the moment.

Example use

Some types implement Columnation, a Rust trait that has no methods but allows one to instantiate a ColumnStack, into which you can copy instances of the type. The instances are then available through the stack's Deref implementation, which presents the stack as a &[Type]. Indexed elements present as &Type, and in principle you can do all the usual things with those references, despite not actually having an actual Type backing them.

// Some data are suitable for translation into columnar form.
let my_data = vec![vec![(0u64, vec![(); 1 << 40], format!("grawwwwrr!")); 32]; 32];

// Such data can be copied in to a columnar region.
let mut my_region = ColumnStack::default();
for _ in 0 .. 1024 {
    my_region.copy(&my_data);
}

// The copying above is substantially faster than cloning the
// data, 21ms versus 198ms, when cloned like so:
let mut my_vec = Vec::with_capacity(1024);
for _ in 0 .. 1024 {
    my_vec.push(my_data.clone());
}

// At this point, `my_region` has just tens of allocations,
// despite presenting as if a thousand records which would
// normally have a thousand allocations behind each of them.
assert_eq!(&my_region[..], &my_vec[..]);

Measurements

I took various types of records, generally containing a thousand allocations or so, and either copy or clone them in to a container 1024 times, just as above. Here are the benchmark times that Rust's cargo bench tool provides, where _clone is cloning into a vector, and _copy is copying into a region-backed container.

running 16 tests
test empty_clone      ... bench:         835 ns/iter (+/- 114)
test empty_copy       ... bench:       3,103 ns/iter (+/- 346)
test string10_clone   ... bench: 108,914,118 ns/iter (+/- 7,864,343)
test string10_copy    ... bench:   4,654,702 ns/iter (+/- 312,409)
test string20_clone   ... bench:  59,302,789 ns/iter (+/- 8,014,183)
test string20_copy    ... bench:   2,818,970 ns/iter (+/- 191,415)
test u32x2_clone      ... bench:   1,920,494 ns/iter (+/- 198,815)
test u32x2_copy       ... bench:     282,235 ns/iter (+/- 40,851)
test u64_clone        ... bench:   1,951,842 ns/iter (+/- 129,288)
test u64_copy         ... bench:     234,412 ns/iter (+/- 25,186)
test u8_u64_clone     ... bench:   1,931,056 ns/iter (+/- 162,882)
test u8_u64_copy      ... bench:     266,326 ns/iter (+/- 35,203)
test vec_u_s_clone    ... bench: 120,642,691 ns/iter (+/- 9,124,488)
test vec_u_s_copy     ... bench:   5,801,229 ns/iter (+/- 522,024)
test vec_u_vn_s_clone ... bench: 134,171,625 ns/iter (+/- 18,599,137)
test vec_u_vn_s_copy  ... bench:   8,580,739 ns/iter (+/- 451,180)

In each case, other than the intentionally trivial empty case, the _copy version is markedly faster than the _clone version. This makes some sense, as we are able to re-use all of the allocations across runs in the _copy case and only the vector's spine in the _clone case (we could attempt more complicated buffer pooling, but we haven't done that here).

The empty case has an interesting story. When consolidating the multiple Vec<()> allocations into one Vec<()>, we introduce the cost of maintaining the length and capacity of that vector. It should be "unbounded" with a zero-sized type, but in fact we need to verify that it does not reach usize::MAX. The empty_clone case does not need to do this, and optimizes to a memcpy, whereas the empty_copy case must check the capacity between each insertion.

Description

A type like Vec<String> would be encoded in Columnation using memory that roughly resembles

struct Roughly {
    /// Where each `Vec<String>` record is put.
    records: Vec<Vec<String>>,
    /// All of the `String` in all of the records.
    strings: Vec<String>
    /// All of the bytes in all of the strings in all of the records.
    bytes: Vec<u8>,
}

In fact, each member is a sequence of allocations, because we will not be able to re-size any of our allocations (without substantial work) and we instead create new geometrically increasing allocations. The struct has a special Drop implementation that releases the three bundles of allocations, but does not recursively call the Drop implementation of their members (to avoid attempting to release the pointers to the interior of the allocations, which we know are not valid).

More general types need to describe how to safely store owned allocations they may have. The Columnation trait they implement allows them to specify an associated type, InnerRegion, which is what absorbs allocations owned by the type.

Relation to Abomonation

The intent of Columnation is similar to Abomonation, to present immutable references to Rust types without requiring the existence of actual owned types, but their memory layouts are different.

Whereas Abomonation serializes each record contiguously, Columnation "serializes" the allocations behind each record separately into distinct statically typed buffers. Abomonation has locality-of-reference advantages when you want to investigate an entire record, and Columnation has memory compactness advantages when you often need only subsets of the allocations backing records.

Safety

Part of the goal of pushing the code is to get some eyes on it. It could be safe, but I am not clear on the pointer aliasing rules required for safety (nor where they are recorded). Also, there could totally be bugs, but I should fix those.

Dependencies