#arithmetic #tan #sin #cos #integer-arithmetic

erydanos

Optimized routines for ARM NEON and SSE

17 releases

0.2.14 Aug 19, 2024
0.2.13 Aug 19, 2024
0.2.11 Jul 31, 2024
0.1.1 Jul 10, 2024

#211 in Math

Download history 371/week @ 2024-07-06 616/week @ 2024-07-13 169/week @ 2024-07-20 346/week @ 2024-07-27 210/week @ 2024-08-03 42/week @ 2024-08-10 291/week @ 2024-08-17 48/week @ 2024-08-24 32/week @ 2024-08-31

416 downloads per month
Used in 5 crates (2 directly)

Apache-2.0 OR BSD-3-Clause

460KB
12K SLoC

Math utilities for NEON, SSE, AVX and scalar implementation

Contains basic math routines for scalar implementations and NEON simd and SSE and AVX routines. Everything implemented in single precision and double precision. Almost all routines have ULP under 1.5 that is absolutely enough for media processing application (for some media application it can be too high). All methods reasonable fast for general purpose use. Performance comparable to libm, sometimes faster, sometimes slower, but may be worse than CPU integrated solutions. Have complementary NEON (double, double) type, and uint128. Adds 64 bits integer arithmetics for SSE.

Implemented routines:

  • abs
  • acos
  • asin
  • atan
  • atan2
  • cbrt
  • floor
  • exp
  • fmod
  • ln
  • hypot
  • pow
  • sin
  • cos
  • tan
  • sqrt
  • ceil
  • hypot3
  • hypot4

Example

let value = 0.1f32.esin();

// For NEON simd
let value = vsinq_f32(vdupq_n_f32(0.1f32));

Performance against libm

Sine Erydanos time: [17.785 ns 17.884 ns 18.095 ns]
Sine libm time: [27.928 ns 28.595 ns 29.398 ns]

Tan Erydanos time: [27.593 ns 27.607 ns 27.621 ns]
Tan libm time: [28.854 ns 29.165 ns 29.467 ns]

Cbrt Erydanos time: [23.260 ns 23.452 ns 23.650 ns]
Cbrt Erydanos time: [23.260 ns 23.452 ns 23.650 ns]

Pow Erydanos time: [66.930 ns 67.465 ns 68.025 ns]
Pow libm time: [170.74 ns 172.71 ns 174.67 ns]

Asin Erydanos time: [23.730 ns 23.953 ns 24.156 ns]
Asin libm time: [349.02 ns 350.39 ns 352.24 ns]

Atan Erydanos time: [20.882 ns 21.115 ns 21.347 ns]
Atan libm time: [20.128 ns 20.309 ns 20.494 ns]

Dependencies