2 releases
0.1.1 | Oct 31, 2023 |
---|---|
0.1.0 | Jun 24, 2023 |
#8 in #ristretto
1,317,367 downloads per month
Used in 2,499 crates
(5 directly)
27KB
421 lines
A more convenient #[target_feature]
replacement
To get good performance out of SIMD everything on the SIMD codepath must be inlined. With how SIMD is currently implemented in Rust one of two things have to be true for a function using SIMD to be inlinable: (and this includes the SIMD intrinsics themselves)
a) The whole program has to be compiled with the relevant -C target-cpu
or -C target-feature
flags.
b) SIMD support must be automatically detected at runtime, and every function on the SIMD codepath must be marked with #[target_feature]
.
Both have their downsides. Setting the target-cpu
or target-features
makes the resulting binary
incompatible with older CPUs, while using #[target_feature]
is incredibly inconvenient.
This crate is meant to make #[target_feature]
less painful to use.
Problems with #[target_feature]
When we're not compiling with the relevant target-cpu
/target-feature
flags everything on
the SIMD codepath must be marked with the #[target_feature]
attribute. This is not a problem
when all of your SIMD code is neatly encapsulated inside of a single function, but once you start
to build out more elaborate abstractions it starts to become painful to use.
-
It can only be used on
unsafe
functions, so everything on your SIMD codepath now has to beunsafe
.In theory this is nice - these functions require the relevant SIMD instructions to be present at runtime, so calling them without checking is obviously unsafe! But in practice this is rarely what you want. When you build an abstraction over SIMD code you usually want to assume that internally within your module all of the necessary SIMD instructions are available, and you only want to check this at the boundaries when you're first entering your module. You do not want to infect everything inside of the module with
unsafe
since you've already checked this invariant at the module's API boundary. -
It cannot be used on non-
unsafe
trait methods.If you're implementing a trait, say for example
std::ops::Add
, then you cannot mark the methodunsafe
unless the original trait also has it marked asunsafe
, and usually it doesn't. -
It makes it impossible to abstract over a given SIMD instruction set using a trait.
For example, let's assume you want to abstract over which SIMD instructions you use using a trait in the following way:
trait Backend { unsafe fn sum(input: &[u32]) -> u32; } struct AVX; # #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] impl Backend for AVX { #[target_feature(enable = "avx")] unsafe fn sum(xs: &[u32]) -> u32 { // ... todo!(); } } struct AVX2; # #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] impl Backend for AVX2 { #[target_feature(enable = "avx2")] unsafe fn sum(xs: &[u32]) -> u32 { // ... todo!(); } } // And now you want a have function which calls into that trait: unsafe fn do_calculations<B>(xs: &[u32]) -> u32 where B: Backend { let value = B::sum(xs); // ...do some more calculations here... value }
We have a problem here. This has to be marked with
#[target_feature]
, and that has to specify the concrete feature flag for a given SIMD instruction set, but this function is generic so we can't do that!
How does this crate make it better?
You can now mark safe functions with #[target_feature]
This crate exposes an #[unsafe_target_feature]
macro which works just like #[target_feature]
except
it moves the unsafe
from the function prototype into the macro name, and can be used on safe functions.
// ERROR: `#[target_feature(..)]` can only be applied to `unsafe` functions
#[target_feature(enable = "avx2")]
fn func() {}
// It works, but must be `unsafe`
# #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
#[target_feature(enable = "avx2")]
unsafe fn func() {}
use curve25519_dalek_derive::unsafe_target_feature;
// No `unsafe` on the function itself!
# #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
#[unsafe_target_feature("avx2")]
fn func() {}
It can also be used to mark functions inside of impls:
struct S;
impl core::ops::Add for S {
type Output = S;
// ERROR: method `add` has an incompatible type for trait
#[target_feature(enable = "avx2")]
unsafe fn add(self, rhs: S) -> S {
S
}
}
use curve25519_dalek_derive::unsafe_target_feature;
struct S;
# #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
#[unsafe_target_feature("avx2")]
impl core::ops::Add for S {
type Output = S;
// No `unsafe` on the function itself!
fn add(self, rhs: S) -> S {
S
}
}
You can generate specialized copies of a module for each target feature
use curve25519_dalek_derive::unsafe_target_feature_specialize;
# #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
#[unsafe_target_feature_specialize("sse2", "avx2", conditional("avx512ifma", nightly))]
mod simd {
#[for_target_feature("sse2")]
pub const CONSTANT: u32 = 1;
#[for_target_feature("avx2")]
pub const CONSTANT: u32 = 2;
#[for_target_feature("avx512ifma")]
pub const CONSTANT: u32 = 3;
pub fn func() { /* ... */ }
}
# #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
fn entry_point() {
#[cfg(nightly)]
if std::is_x86_feature_detected!("avx512ifma") {
return simd_avx512ifma::func();
}
if std::is_x86_feature_detected!("avx2") {
return simd_avx2::func();
}
if std::is_x86_feature_detected!("sse2") {
return simd_sse2::func();
}
unimplemented!();
}
How to use #[unsafe_target_feature]
?
- Can be used on
fn
s,impl
s andmod
s. - When used on a function will only apply to that function; it won't apply to any nested functions, traits, mods, etc.
- When used on an
impl
will only apply to all of the functions directly defined inside of thatimpl
. - When used on a
mod
will only apply to all of thefn
s andimpl
s directly defined inside of thatmod
. - Cannot be used on methods which use
self
orSelf
; instead use it on theimpl
in which the method is defined.
License
Licensed under either of
- Apache License, Version 2.0, LICENSE-APACHE
- MIT license (LICENSE-MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Dependencies
~285–740KB
~18K SLoC