#vector #abstraction #simd #operations #generic #scalar #explicit

no-std generic-simd

safe and idiomatic zero-cost abstractions for writing explicit cross-platform SIMD operations

1 unstable release

0.1.0 Sep 7, 2020

#24 in #explicit

MIT/Apache

110KB
3K SLoC

generic-simd

Build Status Rustc Version 1.42+ License Crates.io Rust Documentation

generic-simd provides safe and idiomatic zero-cost abstractions for writing explicit cross-platform SIMD operations.

License

generic-simd is distributed under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE-APACHE and LICENSE-MIT for details.


lib.rs:

generic-simd provides safe and idiomatic zero-cost abstractions for writing explicit cross-platform SIMD operations.

Supported architectures

All architectures are supported via scalar fallbacks, but the following instruction sets are also supported:

  • SSE4.1 (x86/x86-64)
  • AVX (x86/x86-64)
  • NEON (aarch64, with nightly cargo feature)
  • SIMD128 (wasm32, with nightly cargo feature and simd128 target feature)

The various architecture-specific types are available in the arch module.

Abstractions

Vector abstractions are provided via the traits in the vector module. Generics that use these traits are able to utilize any of the supported instruction sets.

The following example performs a vector-accelerated sum of an input slice:

use generic_simd::{
    arch::Token,
    dispatch,
    scalar::ScalarExt,
    slice::SliceExt,
    vector::NativeVector,
};

// This function provides a generic implementation for any instruction set.
// Here we use the "native" vector type, i.e. the widest vector directly supported by the
// architecture.
#[inline]
fn sum_impl<T>(token: T, input: &[f32]) -> f32
where
    T: Token,
    f32: ScalarExt<T> + core::iter::Sum<NativeVector<f32, T>>,
{
    // Use aligned loads in this example, which may be better on some architectures.
    let (start, vectors, end) = input.align_native(token);

    // Sum across the vector lanes, plus the unaligned portions
    vectors.iter().copied().sum::<f32>() + start.iter().chain(end).sum::<f32>()
}

// This function selects the best instruction set at runtime.
// The "dispatch" macro compiles this function for each supported architecture.
#[dispatch(token)]
fn sum(input: &[f32]) -> f32 {
    sum_impl(token, input)
}

assert_eq!(sum(&[1f32; 10]), 10.);

Vector shims

Various instruction sets provide vectors with different widths, so shims are provided to create vectors of particular widths regardless of architecture. These are available in the shim module.

For example, the following function performs an Array of Structures of Arrays operation using arrays of 4 f64s regardless of instruction set:

use generic_simd::{
    arch::Token,
    dispatch,
    scalar::Scalar,
    slice::Slice,
    vector::{Signed, Vector, width},
};

// Equivalent to an array of 4 2-dimensional coordinates,
// but with a vectorizable memory layout.
struct Coordinates {
    x: [f64; 4],
    y: [f64; 4],
}

// A generic mean implementation for any instruction set.
fn mean_impl<T>(token: T, input: &[Coordinates]) -> (f64, f64)
where
    T: Token,
    f64: Scalar<T, width::W4>,
    <f64 as Scalar<T, width::W4>>::Vector: Signed,
{
    let mut xsum = f64::zeroed(token);
    let mut ysum = f64::zeroed(token);

    for Coordinates { x, y } in input {
        // read the arrays into vectors
        xsum += x.read(token);
        ysum += y.read(token);
    }

    // sum across the vector lanes
    (
        xsum.iter().sum::<f64>() / (input.len() * 4) as f64,
        ysum.iter().sum::<f64>() / (input.len() * 4) as f64,
    )
}

// Selects the best instruction set at runtime.
#[dispatch(token)]
fn mean(input: &[Coordinates]) -> (f64, f64) {
    mean_impl(token, input)
}

Dependencies

~1–1.4MB
~34K SLoC