#tensor-flow #neural-networks

tract-linalg

Tiny, no-nonsense, self contained, TensorFlow and ONNX inference

61 releases

new 0.15.4 Oct 21, 2021
0.15.3 Jul 29, 2021
0.15.0 Jun 24, 2021
0.13.1 Mar 29, 2021
0.2.9 Mar 28, 2019

#12 in #neural-networks

Download history 645/week @ 2021-07-04 424/week @ 2021-07-11 557/week @ 2021-07-18 728/week @ 2021-07-25 1058/week @ 2021-08-01 643/week @ 2021-08-08 658/week @ 2021-08-15 597/week @ 2021-08-22 519/week @ 2021-08-29 938/week @ 2021-09-05 697/week @ 2021-09-12 837/week @ 2021-09-19 801/week @ 2021-09-26 701/week @ 2021-10-03 1278/week @ 2021-10-10 669/week @ 2021-10-17

3,302 downloads per month
Used in 14 crates (via tract-core)

MIT/Apache

350KB
9K SLoC

tract-linalg

linalg stands for "linear algebra". This is a misnamer. This crates contains low-level, architecture dependant optimisations used by tract-core.

Functions

  • MatMatMul: Extended matrix*matrix product:
    • inspired by Gotoblass and BLIS micro kernel approach
    • extended for convolution friendly addressing (fused img2col)
    • fused output pipeline (min, max, and a few more simple, fast ops)
    • f32*f32 -> f32 (à la sgemm)
    • i8*i8 -> i32 accumulator -> i32 storage
    • i8*i8 -> i32 accumulator -> i8 (with channel zeropoint and scale, and re-quantization pipeline)
  • f32 sigmoid and f32 tanh: at f32 precision, by a rationale function (no exponentiation)
  • byte-to-byte lookup table

Implementations

generic fallback armv6, vfp armv7 neon armv8 simd x64 FMA
MatMatMul f32 4x4 8x4 8x8 16x6
MatMatMul i8->i8 8x4 8x8
MatMatMul i8->i32 8x8
sigmoid f32 4n 4n
tanh f32 4n 4n
byte lookup

Dependencies

~3–4MB
~88K SLoC

!}qa