#onnx #machine-learning #model #tanh #operation #gelu

rten-vecmath

SIMD vectorized implementations of various math functions used in ML models

16 releases (breaking)

0.17.0 Apr 9, 2025
0.16.0 Feb 8, 2025
0.15.1 Jan 6, 2025
0.15.0 Dec 28, 2024
0.1.0 Dec 31, 2023

#1283 in Machine learning

Download history 425/week @ 2024-12-26 563/week @ 2025-01-02 491/week @ 2025-01-09 473/week @ 2025-01-16 420/week @ 2025-01-23 416/week @ 2025-01-30 709/week @ 2025-02-06 365/week @ 2025-02-13 386/week @ 2025-02-20 458/week @ 2025-02-27 312/week @ 2025-03-06 393/week @ 2025-03-13 429/week @ 2025-03-20 343/week @ 2025-03-27 430/week @ 2025-04-03 623/week @ 2025-04-10

1,903 downloads per month
Used in 8 crates (via rten)

MIT/Apache

265KB
6K SLoC

SIMD-vectorized implementations of operations used in neural networks.

These implementations are used as kernels for operations in the rten crate.

Constructing and dispatching operations

The operations are implemented by structs which implement the SIMD operation traits from rten-simd. To apply an operation to data, first construct the operation using the struct from this crate, then use a dispatch method from the SimdOp or SimdUnaryOp traits to execute the operation.

In-place and non in-place operations

Some operations support both updating data in place or reading input from one slice and writing to another. For unary operations this is controlled by dispatching with either map or map_mut. For other operations this is handled by exposing different constructors for the in-place and mutating cases, such as Softmax::new and Softmax::new_mut.

For operations which use a separate source and destination, the destination is expected to be an uninitialized slice ([MaybeUninit<T>]). This allows the caller to control allocation of the buffer and avoid the overhead of initializing elements which the operation will overwrite. The ExtendInit trait provides a safe API for the common task of filling a new Vec with the result of the operation.

Examples

Applying a vectorized unary function

use std::mem::MaybeUninit;

use rten_simd::SimdUnaryOp;
use rten_vecmath::Erf;

// Apply the error function to each element of `data`.
let mut data = [1., 0.5, 2.0];
let erf_op = Erf {};
erf_op.map_mut(&mut data);

// Apply the error function to each element of `src`, writing to `dest`.
let src = [1., 0.5, 2.0];
let mut dest = [MaybeUninit::uninit(); 3];
erf_op.map(&src, &mut dest);

Applying softmax in place

This example applies the softmax function in-place to a mutable slice.

use rten_simd::SimdOp;
use rten_vecmath::Softmax;

let mut data = [1., 0.5, 2.0];
Softmax::new_mut(&mut data).dispatch();

Applying softmax with separate input and output buffers

This example reads data from an input and writes to an uninitialized output buffer (&mut [MaybeUninit<f32>]), obtained from the uninitialized portion of a Vec<f32>. To update the length of the Vec<f32> after it is initialized, the helper ExtendInit trait is used.

use rten_simd::SimdOp;
use rten_vecmath::{Softmax, ExtendInit};

let data = [1., 0.5, 2.0];
let mut output = Vec::with_capacity(data.len());
output.extend_init(|output_uninit| {
    // `output_uninit` is the uninitialized part of `output`, as returned by
    // `output.spare_capacity_mut()`.
    //
    // The `dispatch` call initializes it and returns the initialized slice.
    Softmax::new(&data, output_uninit).dispatch()
});
assert_eq!(output.len(), 3);

Computing the sum of a list of floats

use rten_simd::SimdOp;
use rten_vecmath::Sum;

let data = [1., 0.5, 2.0];
let sum = Sum::new(&data).dispatch();

rten-vecmath

This crate contains SIMD-vectorized kernels ("vectorized math") for various operations used in machine learning models. This includes:

  • Math functions such as exp, erf, tanh
  • Activation function such as gelu
  • Normalization functions such as softmax and mean-variance normalization
  • Reduction functions such as sums and sum-of-square

SIMD operations are implemented using portable SIMD types from the rten-simd crate.

Dependencies