133 releases

new 0.21.5 May 11, 2024
0.21.2 Mar 29, 2024
0.20.22 Nov 28, 2023
0.20.7 Jun 14, 2023
0.2.9 Mar 28, 2019

#492 in Machine learning

Download history 5063/week @ 2024-01-22 6405/week @ 2024-01-29 6660/week @ 2024-02-05 6723/week @ 2024-02-12 7877/week @ 2024-02-19 15370/week @ 2024-02-26 6732/week @ 2024-03-04 5825/week @ 2024-03-11 4500/week @ 2024-03-18 7891/week @ 2024-03-25 5322/week @ 2024-04-01 8880/week @ 2024-04-08 7418/week @ 2024-04-15 9646/week @ 2024-04-22 7199/week @ 2024-04-29 2913/week @ 2024-05-06

28,172 downloads per month
Used in 39 crates (2 directly)

MIT/Apache

565KB
13K SLoC

tract-linalg

linalg stands for "linear algebra". This is a misnamer. This crates contains low-level, architecture dependant optimisations used by tract-core.

Functions

  • MatMatMul: Extended matrix*matrix product:
    • inspired by Gotoblass and BLIS micro kernel approach
    • extended for convolution friendly addressing (fused img2col)
    • fused output pipeline (min, max, and a few more simple, fast ops)
    • f32*f32 -> f32 (à la sgemm)
    • i8*i8 -> i32 accumulator -> i32 storage
    • i8*i8 -> i32 accumulator -> i8 (with channel zeropoint and scale, and re-quantization pipeline)
  • f32 sigmoid and f32 tanh: at f32 precision, by a rationale function (no exponentiation)
  • byte-to-byte lookup table

Implementations

generic fallback armv6, vfp armv7 neon armv8 simd x64 FMA
MatMatMul f32 4x4 8x4 8x8 16x6
MatMatMul i8->i8 8x4 8x8
MatMatMul i8->i32 8x8
sigmoid f32 4n 4n
tanh f32 4n 4n
byte lookup

Dependencies

~8–11MB
~201K SLoC