|0.1.12||Jul 29, 2021|
|0.1.11||Jul 23, 2021|
#681 in Math
79 downloads per month
Used in 2 crates
This crate uses SIMD code to achieve very fast Galois Field matrix multiplication.
I have previously implemented a version of this algorithm on a PlayStation 3. It is available here
I will implement three different SIMD engines for field multiplication across vectors:
x86 implementation of parallel long (bitwise) multiplication
Arm/Aarch64 NEON implementation using hardware polynomial multiply and table-based modular reduction (vmull/tvbl)
Arm NEON implementation of parallel long (bitwise) multiplication
4-way armv6 (Thumb) implementation of the long multiplication routine
Support for Arm targets requires nightly Rust build.
Infinite Tape (Simulation)
Before I start writing arch-specific implementations, I'm focusing on
clearly documenting how the algorithm works. I'm going to implement a
non-SIMD version that uses the same basic ideas, but using a more
rusty style (infinite iterators). That's in
src/arch.rs and can be
enabled as a feature:
cargo test --features simulator --tests simulator
I'll also use this to prove that the algorithm works as intended.
Write and test simulation of non SIMD algorithm
Write and test simulation of SIMD algorithm
Using the simd version of the field multiplication routine, I now have:
- SIMD version of x86 matrix multiply
It needs a bit more work, but it's tested and runs around 3x faster
than the reference version. See
details. To run that with all relevant optimisations, you might need
to turn on some compile flags:
RUSTFLAGS="-O -C target-cpu=native -C target-feature=+ssse3,+sse4.1,+sse4.2,+avx" cargo bench -q "matrix"