13 releases

0.1.12	Jul 29, 2021
0.1.11	Jul 23, 2021

#962 in Math

65 downloads per month
Used in 2 crates

GPL-2.0-or-later OR LGPL-2.0-or-later

150KB
2.5K SLoC

guff-matrix

This crate uses SIMD code to achieve very fast Galois Field matrix multiplication.

Previous Implementation

I have previously implemented a version of this algorithm on a PlayStation 3. It is available here

SIMD Support

I will implement three different SIMD engines for field multiplication across vectors:

x86 implementation of parallel long (bitwise) multiplication
Arm/Aarch64 NEON implementation using hardware polynomial multiply and table-based modular reduction (vmull/tvbl)
Arm NEON implementation of parallel long (bitwise) multiplication
4-way armv6 (Thumb) implementation of the long multiplication routine

Support for Arm targets requires nightly Rust build.

Infinite Tape (Simulation)

Before I start writing arch-specific implementations, I'm focusing on clearly documenting how the algorithm works. I'm going to implement a non-SIMD version that uses the same basic ideas, but using a more rusty style (infinite iterators). That's in src/arch.rs and can be enabled as a feature:

cargo test --features simulator --tests simulator

I'll also use this to prove that the algorithm works as intended.

Write and test simulation of non SIMD algorithm
Write and test simulation of SIMD algorithm

Matrix multiplication

Using the simd version of the field multiplication routine, I now have:

SIMD version of x86 matrix multiply

It needs a bit more work, but it's tested and runs around 3x faster than the reference version. See benches/vector_mul.rs for details. To run that with all relevant optimisations, you might need to turn on some compile flags:

RUSTFLAGS="-O -C target-cpu=native -C target-feature=+ssse3,+sse4.1,+sse4.2,+avx" cargo bench -q "matrix"

Dependencies

~545KB
~11K SLoC