#tensor-flow #neural-networks

tract-linalg

Tiny, no-nonsense, self contained, TensorFlow and ONNX inference

97 releases

0.19.2 Jan 27, 2023
0.19.0-alpha.19 Dec 19, 2022
0.18.4 Nov 23, 2022
0.17.3 Jul 25, 2022
0.2.9 Mar 28, 2019

#155 in Machine learning

Download history 2784/week @ 2022-10-14 3190/week @ 2022-10-21 2361/week @ 2022-10-28 2097/week @ 2022-11-04 2021/week @ 2022-11-11 2081/week @ 2022-11-18 1839/week @ 2022-11-25 1751/week @ 2022-12-02 2300/week @ 2022-12-09 1874/week @ 2022-12-16 792/week @ 2022-12-23 1068/week @ 2022-12-30 1956/week @ 2023-01-06 2437/week @ 2023-01-13 2461/week @ 2023-01-20 1919/week @ 2023-01-27

9,072 downloads per month
Used in 27 crates (via tract-core)

MIT/Apache

460KB
10K SLoC

tract-linalg

linalg stands for "linear algebra". This is a misnamer. This crates contains low-level, architecture dependant optimisations used by tract-core.

Functions

  • MatMatMul: Extended matrix*matrix product:
    • inspired by Gotoblass and BLIS micro kernel approach
    • extended for convolution friendly addressing (fused img2col)
    • fused output pipeline (min, max, and a few more simple, fast ops)
    • f32*f32 -> f32 (à la sgemm)
    • i8*i8 -> i32 accumulator -> i32 storage
    • i8*i8 -> i32 accumulator -> i8 (with channel zeropoint and scale, and re-quantization pipeline)
  • f32 sigmoid and f32 tanh: at f32 precision, by a rationale function (no exponentiation)
  • byte-to-byte lookup table

Implementations

generic fallback armv6, vfp armv7 neon armv8 simd x64 FMA
MatMatMul f32 4x4 8x4 8x8 16x6
MatMatMul i8->i8 8x4 8x8
MatMatMul i8->i32 8x8
sigmoid f32 4n 4n
tanh f32 4n 4n
byte lookup

Dependencies

~5.5–7MB
~155K SLoC