#TensorFlow #NeuralNetworks

tract-linalg

Tiny, no-nonsense, self contained, TensorFlow and ONNX inference

44 releases (9 breaking)

new 0.11.0 Sep 17, 2020
0.10.0 Jul 28, 2020
0.6.1 Mar 24, 2020
0.5.8 Dec 23, 2019
0.2.9 Mar 28, 2019

#8 in #tensor-flow

Download history 420/week @ 2020-06-03 939/week @ 2020-06-10 590/week @ 2020-06-17 236/week @ 2020-06-24 372/week @ 2020-07-01 251/week @ 2020-07-08 327/week @ 2020-07-15 484/week @ 2020-07-22 408/week @ 2020-07-29 245/week @ 2020-08-05 715/week @ 2020-08-12 612/week @ 2020-08-19 583/week @ 2020-08-26 521/week @ 2020-09-02 283/week @ 2020-09-09 376/week @ 2020-09-16

1,810 downloads per month
Used in 12 crates (via tract-core)

MIT/Apache

210KB
5.5K SLoC

tract-linalg

linalg stands for "linear algebra". This is a misnamer. This crates contains low-level, architecture dependant optimisations used by tract-core.

Functions

  • MatMatMul: Extended matrix*matrix product:
    • inspired by Gotoblass and BLIS micro kernel approach
    • extended for convolution friendly addressing (fused img2col)
    • fused output pipeline (min, max, and a few more simple, fast ops)
    • f32*f32 -> f32 (à la sgemm)
    • i8*i8 -> i32 accumulator -> i32 storage
    • i8*i8 -> i32 accumulator -> i8 (with channel zeropoint and scale, and re-quantization pipeline)
  • f32 sigmoid and f32 tanh: at f32 precision, by a rationale function (no exponentiation)
  • byte-to-byte lookup table

Implementations

generic fallback armv6, vfp armv7 neon armv8 simd x64 FMA
MatMatMul f32 4x4 8x4 8x8 16x6
MatMatMul i8->i8 8x4 8x8
MatMatMul i8->i32 8x8
sigmoid f32 4n 4n
tanh f32 4n 4n
byte lookup

Dependencies

~1–2.1MB
~48K SLoC