5 releases (3 breaking)

0.7.0 Aug 20, 2024
0.6.0 May 9, 2024
0.4.0 Mar 21, 2024
0.3.1 Feb 28, 2024
0.3.0 Feb 27, 2024

#8 in #soa

Download history 7/week @ 2024-07-01 11/week @ 2024-07-22 20/week @ 2024-07-29 1/week @ 2024-08-05 166/week @ 2024-08-19 13/week @ 2024-08-26 7/week @ 2024-09-09 31/week @ 2024-09-16 24/week @ 2024-09-23 23/week @ 2024-09-30 20/week @ 2024-10-07

101 downloads per month
Used in soa-rs

MIT license

46KB
891 lines

docs.rs Crates.io Version GitHub License

soa-rs

soa-rs makes it simple to work with the structure-of-arrays memory layout. What Vec<T> is to array-of-structures (AoS), Soa<T> is to structure-of-arrays (SoA).

Example

use soa_rs::{Soars, soa, AsSlice};

// Derive soa-rs for your type
#[derive(Soars, PartialEq, Debug)]
#[soa_derive(Debug, PartialEq)]
struct Baz {
    foo: u16,
    bar: u8,
}

// Create the SoA
let mut soa = soa![
    Baz { foo: 1, bar: 2 },
    Baz { foo: 3, bar: 4 },
];

// Each field has a slice
assert_eq!(soa.foo(), [1, 3]);
assert_eq!(soa.bar(), [2, 4]);

// Tuple structs work too
#[derive(Soars, PartialEq, Debug)]
#[soa_derive(Debug, PartialEq)]
struct Tuple(u16, u8);
let tuple = soa![Tuple(1, 2), Tuple(3, 4), Tuple(5, 6), Tuple(7, 8)];

// SoA can be sliced and indexed like normal slices
assert_eq!(tuple.idx(1..3), soa![Tuple(3, 4), Tuple(5, 6)]);
assert_eq!(tuple.idx(3), TupleRef(&7, &8));

// Drop-in for Vec in many cases
soa.insert(0, Baz { foo: 5, bar: 6 });
assert_eq!(soa.pop(), Some(Baz { foo: 3, bar: 4 }));
assert_eq!(soa, soa![Baz { foo: 5, bar: 6 }, Baz { foo: 1, bar: 2 }]);
for mut el in &mut soa {
    *el.foo += 10;
}
assert_eq!(soa, soa![Baz { foo: 15, bar: 6 }, Baz { foo: 11, bar: 2}]);

What is SoA?

Whereas AoS stores all the fields of a type in each element of the array, SoA splits each field into its own array. For example, consider

struct Example {
    foo: u8,
    bar: u64,
}

In order to have proper memory alignment, this struct will have the following layout. In this extreme example, almost half of the memory is wasted to padding.

╭───┬───────────────────────────┬───────────────────────────────╮
│foo│         padding           │              bar              │
╰───┴───────────────────────────┴───────────────────────────────╯

By using SoA, the fields will be stored separately, removing the need for padding:

╭───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬┄
│foo│foo│foo│foo│foo│foo│foo│foo│foo│foo│foo│foo│foo│foo│foo│foo│
╰───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴┄
╭───────────────────────────────┬───────────────────────────────┬┄
│             bar               │              bar              │
╰───────────────────────────────┴───────────────────────────────┴┄

Performance

In addition to lowering memory usage, there are several reasons why SoA can offer better performance:

  • By removing padding, each cacheline is typically more information-dense.
  • When accessing only a subset of the available fields, only data for those fields will be fetched.

SoA does not offer performance wins in all cases. In particular, operations such as push and pop are usually slower than for Vec since the memory for each field is far apart. SoA is most appropriate when either

  • Sequential access is the common access pattern
  • You are frequently accessing or modifying only a subset of the fields

SIMD vectorization

SoA makes getting data into and out of SIMD registers trivial. Since values are stored sequentially, loading data is as simple as reading a range of memory into the register. This bulk data transfer is very amenable to auto-vectorization. In contrast, AoS stores fields at disjoint locations in memory. Therefore, individual fields must be individually copied to different positions within the registers and, later, shuffled back out in the same way. This can prevent the compiler from applying vectorization. For this reason, SoA is much more likely to benefit from SIMD optimizations.

Examples

Zig

SoA is a popular technique in data-oriented design. Andrew Kelley gives a wonderful talk describing how SoA and other data-oriented design patterns earned him a 39% reduction in wall clock time in the Zig compiler.

Benchmark

soa-rs-testing contains a benchmark comparison that sums the dot products of 2¹⁶ 4D vectors. The Vec version runs in 132µs and the Soa version runs in 22µs, a 6x improvement.

Comparison

soa_derive

soa_derive makes each field its own Vec. Because of this, each field's length, capacity, and allocation are managed separately. In contrast, soa-rs manages a single allocation for each Soa. soa_derive also generates a new collection type for every struct, whereas soa-rs generates a minimal, low-level interface that the generic Soa type uses for its implementation. This provides more type system flexibility, less code generation, and better documentation.

soa-vec

Whereas soa-vec only compiles on nightly, soa-rs also compiles on stable. Rather than using derive macros, soa-vec instead uses macros to generate eight static copies of their SoA type with fixed tuple sizes.

Dependencies

~265–710KB
~17K SLoC