#machine-learning #simd #vectorized #function #ml #mathml #ml-model

rten-vecmath

SIMD vectorized implementations of various math functions used in ML models

8 releases (breaking)

new 0.9.0 May 16, 2024
0.8.0 Apr 29, 2024
0.7.0 Apr 12, 2024
0.6.0 Mar 31, 2024
0.1.0 Dec 31, 2023

#4 in #ml-model

Download history 36/week @ 2024-01-24 34/week @ 2024-01-31 76/week @ 2024-02-07 88/week @ 2024-02-14 100/week @ 2024-02-21 163/week @ 2024-02-28 151/week @ 2024-03-06 135/week @ 2024-03-13 202/week @ 2024-03-20 719/week @ 2024-03-27 359/week @ 2024-04-03 266/week @ 2024-04-10 64/week @ 2024-04-17 278/week @ 2024-04-24 229/week @ 2024-05-01 99/week @ 2024-05-08

693 downloads per month
Used in 5 crates (via rten)

MIT/Apache

89KB
2K SLoC

rten-vecmath

This crate provides portable SIMD types that abstract over SIMD intrinsics on different architectures. Unlike std::simd this works on stable Rust. There is also functionality to detect the available instructions at runtime and dispatch to the optimal implementation.

This crate also contains SIMD-vectorized versions of math functions such as exp, erf, tanh, softmax etc. that are performance-critical in machine-learning models.


lib.rs:

rten-vecmath provides portable SIMD types for implementing vectorized functions that work across different architectures.

Portable SIMD types

The simd_vec module contains types and traits to support writing portable SIMD code, that works on stable Rust.

Supported architectures

SIMD wrappers are provided for the following architectures:

  • Arm Neon
  • AVX 2 / FMA
  • AVX-512 (requires avx512 feature and nightly Rust)
  • WebAssembly SIMD

There is also a scalar fallback that works on all platforms, but provides no performance benefit over non-SIMD code.

Vectorized math functions

This crate contains SIMD-vectorized implementations of various math functions that are commonly used in neural networks.

For each function in this library there are multiple variants, which typically include:

  • A version that operates on scalars
  • A version that reads values from an input slice and writes to the corresponding position in an equal-length output slice. These have a vec_ prefix.
  • A version that reads values from a mutable input slice and writes the computed values back in-place. These have a vec_ prefix and _in_place suffix.

All variants use the same underlying implementation and should have the same accuracy.

See the source code for comments on accuracy.

No runtime deps