1 unstable release
0.1.0 | Sep 7, 2020 |
---|
#28 in #explicit
110KB
3K
SLoC
generic-simd
generic-simd provides safe and idiomatic zero-cost abstractions for writing explicit cross-platform SIMD operations.
License
generic-simd is distributed under the terms of both the MIT license and the Apache License (Version 2.0).
See LICENSE-APACHE and LICENSE-MIT for details.
lib.rs
:
generic-simd
provides safe and idiomatic zero-cost abstractions for writing explicit
cross-platform SIMD operations.
Supported architectures
All architectures are supported via scalar fallbacks, but the following instruction sets are also supported:
- SSE4.1 (x86/x86-64)
- AVX (x86/x86-64)
- NEON (aarch64, with
nightly
cargo feature) - SIMD128 (wasm32, with
nightly
cargo feature andsimd128
target feature)
The various architecture-specific types are available in the arch
module.
Abstractions
Vector abstractions are provided via the traits in the vector
module.
Generics that use these traits are able to utilize any of the supported instruction sets.
The following example performs a vector-accelerated sum of an input slice:
use generic_simd::{
arch::Token,
dispatch,
scalar::ScalarExt,
slice::SliceExt,
vector::NativeVector,
};
// This function provides a generic implementation for any instruction set.
// Here we use the "native" vector type, i.e. the widest vector directly supported by the
// architecture.
#[inline]
fn sum_impl<T>(token: T, input: &[f32]) -> f32
where
T: Token,
f32: ScalarExt<T> + core::iter::Sum<NativeVector<f32, T>>,
{
// Use aligned loads in this example, which may be better on some architectures.
let (start, vectors, end) = input.align_native(token);
// Sum across the vector lanes, plus the unaligned portions
vectors.iter().copied().sum::<f32>() + start.iter().chain(end).sum::<f32>()
}
// This function selects the best instruction set at runtime.
// The "dispatch" macro compiles this function for each supported architecture.
#[dispatch(token)]
fn sum(input: &[f32]) -> f32 {
sum_impl(token, input)
}
assert_eq!(sum(&[1f32; 10]), 10.);
Vector shims
Various instruction sets provide vectors with different widths, so shims are provided to
create vectors of particular widths regardless of architecture. These are available in the
shim
module.
For example, the following function performs an Array of Structures of Arrays
operation using arrays of 4 f64
s regardless of instruction set:
use generic_simd::{
arch::Token,
dispatch,
scalar::Scalar,
slice::Slice,
vector::{Signed, Vector, width},
};
// Equivalent to an array of 4 2-dimensional coordinates,
// but with a vectorizable memory layout.
struct Coordinates {
x: [f64; 4],
y: [f64; 4],
}
// A generic mean implementation for any instruction set.
fn mean_impl<T>(token: T, input: &[Coordinates]) -> (f64, f64)
where
T: Token,
f64: Scalar<T, width::W4>,
<f64 as Scalar<T, width::W4>>::Vector: Signed,
{
let mut xsum = f64::zeroed(token);
let mut ysum = f64::zeroed(token);
for Coordinates { x, y } in input {
// read the arrays into vectors
xsum += x.read(token);
ysum += y.read(token);
}
// sum across the vector lanes
(
xsum.iter().sum::<f64>() / (input.len() * 4) as f64,
ysum.iter().sum::<f64>() / (input.len() * 4) as f64,
)
}
// Selects the best instruction set at runtime.
#[dispatch(token)]
fn mean(input: &[Coordinates]) -> (f64, f64) {
mean_impl(token, input)
}
Dependencies
~1.5MB
~36K SLoC