3 releases (breaking)

Uses new Rust 2024

new 0.5.0 Apr 23, 2025
0.4.0 Jan 14, 2025
0.3.0 Oct 28, 2024

#158 in Science

Download history 551/week @ 2025-01-05 907/week @ 2025-01-12 1040/week @ 2025-01-19 1221/week @ 2025-01-26 1124/week @ 2025-02-02 1020/week @ 2025-02-09 1870/week @ 2025-02-16 4777/week @ 2025-02-23 2945/week @ 2025-03-02 3434/week @ 2025-03-09 2601/week @ 2025-03-16 2560/week @ 2025-03-23 2211/week @ 2025-03-30 1895/week @ 2025-04-06 1792/week @ 2025-04-13 5936/week @ 2025-04-20

12,119 downloads per month
Used in 24 crates (via cubecl)

MIT/Apache

1MB
30K SLoC

ROCm HIP runtime

Runtime that runs on ROCm HIP supported AMD GPUs.

Matrix multiplication acceleration is based on rocwmma by default. Note that kernel compilation time with rocwmma might be slow.

For RDNA3 GPUs, a dedicated compiler using WMMA intrinsics is available with the feature wmma-intrinsics. It offers much faster kernel compilation time and better performances on some kernels. Feel free to benchmark with your use cases.

Dependencies

~9–23MB
~264K SLoC