1 unstable release
0.1.0 | Oct 12, 2024 |
---|
#970 in Hardware support
172 downloads per month
Used in 3 crates
(via ssimulacra2-cuda)
38KB
743 lines
ssimulacra2-cuda-kernel
ssimulacra2 routines implemented in a cuda kernel in Rust. This requires a recent nightly (2024-04-24) to build with cargo.
Thanks to recent work by @kjetilkjeka in https://github.com/rust-lang/rust/pull/117458, we can now link crates as llvm bitcode before emitting ptx.
rustup +nightly component add llvm-bitcode-linker
# Also requires llvm-tools if you don't have a full llvm toolchain available
rustup +nightly component add llvm-tools
The full rustc command :
rustc +nightly --edition 2021 --crate-name ssimulacra2 --crate-type cdylib --target nvptx64-nvidia-cuda --extern nvptx_panic_handler=../nvptx-panic-handler/libnvptx_panic_handler.rlib src/lib.rs -Z unstable-options -Clinker-flavor=llbc -C opt-level=3 -C target-cpu="sm_60" -C link-arg="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\nvvm\libdevice\libdevice.10.bc"
This project has cargo config setup already so there is no need to invoke rustc directly :
cargo build --package ssimulacra2-cuda-kernel --release --target nvptx64-nvidia-cuda
Safety
The kernels are unsafe by definition and use unsafe everywhere. There is manual calculation and checks happening everywhere, which means we're basically just writing plain C++ code with a fancy syntax.
I recommend using the compute sanitizer tool from the CUDA SDK as it does not even require recompilation or anything. Just look at its output and see if it complains.
compute-sanitizer.bat target\debug\ssimulacra2-cuda.exe
How it used to be
We could not link llvm bitcode directly within rustc, so we had to link it manually, which means we could not integrate this with cargo.
rustc +nightly --edition 2021 --emit llvm-bc --crate-type rlib --crate-name ssimulacra2 --target nvptx64-nvidia-cuda --extern nvptx_panic_handler=../nvptx-panic-handler/libnvptx_panic_handler.rlib src/lib.rs -C opt-level=3
C:/apps/LLVM-18/bin/llvm-link ssimulacra2.bc "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\nvvm\libdevice\libdevice.10.bc" -o ssimulacra2.linked.bc
C:/apps/LLVM-18/bin/opt -p "default<O3>,internalize,globaldce" -internalize-public-api-list=plane_srgb_to_linear,linear_to_xyb_packed,downscale_by_2,mul_planes,ssim_map,edge_diff_map ssimulacra2.linked.bc -o ssimulacra2.opt.bc
C:/apps/LLVM-18/bin/llc -O3 -mcpu=sm_30 ssimulacra2.opt.bc -o ssimulacra2.ptx
CUDA device code to llvm bitcode with clang
clang -S -emit-llvm --cuda-device-only --cuda-gpu-arch=sm_86 --cuda-path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5" shared.cu -o shared.ll
llvm-as shared.ll
Dependencies
~0–440KB