12 releases

0.6.3	May 7, 2025
0.6.2	Mar 4, 2024
0.6.1	Apr 14, 2023
0.5.0	Mar 23, 2023
0.2.1	Nov 28, 2022

#60 in Algorithms

352,211 downloads per month
Used in 268 crates (2 directly)

MIT license

475KB
9K SLoC

ArgMinMax

Efficient argmin & argmax (in 1 function) with SIMD (SSE, AVX(2), AVX512¹, NEON¹) ⚡

🚀 The functions are generic over the type of the array, so it can be used on &[T] or Vec<T> where T can be f16², f32², f64³, i8, i16, i32, i64, u8, u16, u32, u64.

🤝 The trait is implemented for slice, Vec, 1D ndarray::ArrayBase⁴, apache arrow::PrimitiveArray⁵ and arrow2::PrimitiveArray⁶.

⚡ Runtime CPU feature detection is used to select the most efficient implementation for the current CPU. This means that the same binary can be used on different CPUs without recompilation.

👀 The SIMD implementation contains no if checks, ensuring that the runtime of the function is independent of the input data its order (best-case = worst-case = average-case).

🪄 Efficient support for f16 and uints: through (bijective aka symmetric) bitwise operations, f16 (optional¹) and uints are converted to ordered integers, allowing to use integer SIMD instructions.

¹ for AVX512 and most of NEON you should enable the (default) "nightly_simd" feature (requires nightly Rust).
² for f16 you should enable the "half" feature.
³ for f32 and f64 you should enable the (default) "float" feature.
⁴ for ndarray::ArrayBase you should enable the "ndarray" feature.
⁵ for arrow::PrimitiveArray you should enable the "arrow" feature.
⁶ for arrow2::PrimitiveArray you should enable the "arrow2" feature.

Installing

Add the following to your Cargo.toml:

[dependencies]
argminmax = "0.6.3"

Example usage

use argminmax::ArgMinMax;  // import trait

let arr: Vec<i32> = (0..200_000).collect();  // create a vector

let (min, max) = arr.argminmax();  // apply extension

println!("min: {}, max: {}", min, max);
println!("arr[min]: {}, arr[max]: {}", arr[min], arr[max]);

Traits

`ArgMinMax`

Implemented for ints, uints, and floats (if "float" feature enabled).

Provides the following functions:

argminmax: returns the index of the minimum and maximum element in the array.

When dealing with NaNs, ArgMinMax its functions ignore NaNs. For more info see Limitations.

`NaNArgMinMax`

Implemented for floats (if "float" feature enabled).

Provides the following functions:

nanargminmax: returns the index of the minimum and maximum element in the array.

When dealing with NaNs, NaNArgMinMax its functions return the first NaN its index. For more info see Limitations.

Tip 💡: if you know that there are no NaNs in your the array, we advise you to use ArgMinMax as this should be 5-30% faster than NaNArgMinMax.

Features

[default] "nightly_simd": enables the use of non-stable SIMD intrinsics (AVX512 and most of NEON), which are only available on nightly Rust.
[default] "float": support f32 and f64 argminmax (uses NaN-handling - see below).
"half": support f16 argminmax (through using the half crate).
"ndarray": add ArgMinMax trait to ndarray its Array1 & ArrayView1.
"arrow": add ArgMinMax trait to arrow its PrimitiveArray.

Benchmarks

Benchmarks on my laptop (AMD Ryzen 7 4800U, 1.8 GHz, 16GB RAM) using criterion show that the function is 3-20x faster than the scalar implementation (depending of data type).

See /benches/results.

Run the benchmarks yourself with the following command:

cargo bench --quiet --message-format=short --features half | grep "time:"

Tests

To run the tests use the following command:

cargo test --message-format=short --all-features

Limitations

The library handles NaNs! 🚀

Some (minor) limitations:

ArgMinMax its functions ignores NaN values.
- ❗ When the array contains exclusively NaNs and/or infinities unexpected behaviour can occur (index 0 is returned).
NaNArgMinMax its functions returns the first NaN its index (if any present).
- ❗ When multiple bit-representations for NaNs are used, no guarantee is made that the first NaN is returned.

Acknowledgements

Some parts of this library are inspired by the great work of minimalrust's argmm project.

Dependencies

~0–4MB
~75K SLoC