1 unstable release

0.1.0	Oct 12, 2024

#4 in #bc

Used in 3 crates (via ssimulacra2-cuda)

MIT license

38KB
743 lines

ssimulacra2-cuda-kernel

ssimulacra2 routines implemented in a cuda kernel in Rust. This requires a recent nightly (2024-04-24) to build with cargo.

Thanks to recent work by @kjetilkjeka in https://github.com/rust-lang/rust/pull/117458, we can now link crates as llvm bitcode before emitting ptx.

rustup +nightly component add llvm-bitcode-linker
# Also requires llvm-tools if you don't have a full llvm toolchain available
rustup +nightly component add llvm-tools

The full rustc command :

rustc +nightly --edition 2021 --crate-name ssimulacra2 --crate-type cdylib --target nvptx64-nvidia-cuda --extern nvptx_panic_handler=../nvptx-panic-handler/libnvptx_panic_handler.rlib src/lib.rs -Z unstable-options -Clinker-flavor=llbc -C opt-level=3 -C target-cpu="sm_60" -C link-arg="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\nvvm\libdevice\libdevice.10.bc"

This project has cargo config setup already so there is no need to invoke rustc directly :

cargo build --package ssimulacra2-cuda-kernel --release --target nvptx64-nvidia-cuda

Safety

The kernels are unsafe by definition and use unsafe everywhere. There is manual calculation and checks happening everywhere, which means we're basically just writing plain C++ code with a fancy syntax.

I recommend using the compute sanitizer tool from the CUDA SDK as it does not even require recompilation or anything. Just look at its output and see if it complains.

compute-sanitizer.bat target\debug\ssimulacra2-cuda.exe

How it used to be

We could not link llvm bitcode directly within rustc, so we had to link it manually, which means we could not integrate this with cargo.

rustc +nightly --edition 2021 --emit llvm-bc --crate-type rlib --crate-name ssimulacra2 --target nvptx64-nvidia-cuda --extern nvptx_panic_handler=../nvptx-panic-handler/libnvptx_panic_handler.rlib src/lib.rs -C opt-level=3

C:/apps/LLVM-18/bin/llvm-link ssimulacra2.bc "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\nvvm\libdevice\libdevice.10.bc" -o ssimulacra2.linked.bc
C:/apps/LLVM-18/bin/opt -p "default<O3>,internalize,globaldce" -internalize-public-api-list=plane_srgb_to_linear,linear_to_xyb_packed,downscale_by_2,mul_planes,ssim_map,edge_diff_map ssimulacra2.linked.bc -o ssimulacra2.opt.bc

C:/apps/LLVM-18/bin/llc -O3 -mcpu=sm_30 ssimulacra2.opt.bc -o ssimulacra2.ptx

CUDA device code to llvm bitcode with clang

clang -S -emit-llvm --cuda-device-only --cuda-gpu-arch=sm_86 --cuda-path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.5" shared.cu -o shared.ll
llvm-as shared.ll

Dependencies

~0–460KB