#struct #macro-derive #vec #derive #performance #macro #cache

ortho_vec_derive

Derive orthogonal vector structure for better CPU cache usage

1 unstable release

0.1.0 Jul 4, 2023

#2801 in Rust patterns

MIT license

13KB
145 lines

ortho-vec-derive

A derive macro to create orthogonal vectors based on your structs, to allow for better CPU cache usage.

Crates.io Last commit GitHub Workflow Status License: MIT

Motivation

Using a Vec of a large, complex struct, can slow down your run-time.  This can happen, when you don't use all of the fields of the struct when iterating over the vector.

For example let's imagine we have the following struct:

struct LargeStruct {
    speed: f32,
    location: f32,
    more_data: [u64; 10],
}

A Vec<LargeStruct> will look like this in memory:

   4B        4B         80B       4B        4B         80B 
[[speed] [location] [more_data] [speed] [location] [more_data]]
 ^                              ^
 |                              |
 first object starts here       second object starts here

Now we want to loop over the object in this Vec and update the location based on the speed:

struct LargeStruct {
    speed: f32,
    location: f32,
    more_data: [u64; 10],
}

let mut large_structs = vec![LargeStruct {speed: -1.2, location: 7.3, more_data: [0; 10]}];

for large_struct in large_structs.iter_mut() {
    large_struct.location += large_struct.speed;
}

The CPU will read the location in memory for .speed and .location but it will also cache the next bytes in memory.  How many bytes are cached can change based on your specific CPU.  What this means is, that we will read a lot of non required memory (.more_data) into cache.  In turn, the read of .speed for the next object we will have to go directly into RAM again, which is 10-100 times slower than cache.

To prevent these cache misses from happening and slowing down the program it is common to use a Vec per field in the struct, this happens such that the same index in all Vec-s represents the same object.

Writing the code for and managing a few Vec-s this way can be a bit complicated and require a lot of boilerplate code.

The OrthoVec derive macro tries to solve these issues.

Supported API

Vec

  • into_ortho() - Convert the Vec to its ortho version.

ortho-Vec

All of these are the same as the Vec methods

  • iter()
  • iter_mut()
  • into_iter()
  • len()
  • push()
  • pop()
  • clear()
  • reverse()
  • shrink_to_fit()

The only caveat is that, when iterating with iter_mut(), you get a struct that contains a &mut to each inner Vec.  To use it you have to dereference it by adding a * prefix.

Examples

Any named struct (for now - should add support for tuple-like in the future):

#[derive(OrthoVec)]

use ortho_vec_derive::prelude::*;

#[derive(OrthoVec)]
struct WeirdStruct<'a, T: Send>
where
    T: std::fmt::Debug,
{
    a: i32,
    b: &'a f32,
    c: T,
}

fn main () {
    let mut ortho_vec = vec![
        WeirdStruct {
            a: 3,
            b: &4.2,
            c: "nice",
        },
        WeirdStruct {
            a: 6,
            b: &24.2,
            c: "hello",
        },
    ]
    .into_ortho();

    ortho_vec.push(WeirdStruct {
        a: 7,
        b: &8.123,
        c: "push",
    });

    for ws in ortho_vec.iter_mut() {
        *ws.a += 3;
    }

    for ws in ortho_vec.iter() {
        println!("b = {:?}", ws.b);
    }
}

Results

Results may vary between use-cases and platforms.  Running a small benchmark (which is in the crate's repo) we can see a maximum of around 6X speedup, this is at 1M/10M but at sizes like 10K/100K we can also see speedup of around 2-3X.  It is recommended that you just try to bench the 2 versions of the code against each other, this way you can be sure this is working in your case.

Notes

The macro can't be used on a struct defined inside a function for now.

Dependencies

~255–710KB
~17K SLoC