62 releases (15 breaking)

new 0.16.3	May 2, 2025
0.15.2	Apr 14, 2025
0.14.0	Mar 31, 2025
0.12.2	~~Dec 17, 2024~~
0.2.0	Sep 27, 2022

#11 in Machine learning

37,644 downloads per month
Used in 129 crates (42 directly)

MIT/Apache

6MB
167K SLoC

cudarc: minimal and safe api over the cuda toolkit

Checkout cudarc on crates.io and docs.rs.

Safe abstractions over:

Pre-alpha state, expect breaking changes and not all cuda functions contain a safe wrapper. Contributions welcome for any that aren't included!

Design

Goals are:

As safe as possible (there will still be a lot of unsafe due to ffi & async)
As ergonomic as possible
Allow mixing of high level safe apis, with low level sys apis

To that end there are three levels to each wrapper (by default the safe api is exported):

use cudarc::driver::{safe, result, sys};
use cudarc::nvrtc::{safe, result, sys};
use cudarc::cublas::{safe, result, sys};
use cudarc::cublaslt::{safe, result, sys};
use cudarc::curand::{safe, result, sys};
use cudarc::nccl::{safe, result, sys};

where:

sys is the raw ffi apis generated with bindgen
result is a very small wrapper around sys to return Result from each function
safe is a wrapper around result/sys to provide safe abstractions

Heavily recommend sticking with safe APIs

API Preview

It's easy to create a new device and transfer data to the gpu:

// Get a stream for GPU 0
let ctx = cudarc::driver::CudaContext::new(0)?;
let stream = ctx.default_stream();

// copy a rust slice to the device
let inp = stream.memcpy_stod(&[1.0f32; 100])?;

// or allocate directly
let mut out = stream.alloc_zeros::<f32>(100)?;

You can also use the nvrtc api to compile kernels at runtime:

let ptx = cudarc::nvrtc::compile_ptx("
extern \"C\" __global__ void sin_kernel(float *out, const float *inp, const size_t numel) {
    unsigned int i = blockIdx.x * blockDim.x + threadIdx.x;
    if (i < numel) {
        out[i] = sin(inp[i]);
    }
}")?;

// Dynamically load it into the device
let module = ctx.load_module(ptx)?;
let sin_kernel = module.load_function("sin_kernel")?;

cudarc provides a very simple interface to launch kernels using a builder pattern to specify kernel arguments:

let mut builder = stream.launch_builder(&sin_kernel);
builder.arg(&mut out);
builder.arg(&inp);
builder.arg(&100usize);
unsafe { builder.launch(LaunchConfig::for_num_elems(100)) }?;

And of course it's easy to copy things back to host after you're done:

let out_host: Vec<f32> = stream.memcpy_dtov(&out)?;
assert_eq!(out_host, [1.0; 100].map(f32::sin));

License

Dual-licensed to be compatible with the Rust project.

Licensed under the Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0 or the MIT license http://opensource.org/licenses/MIT, at your option. This file may not be copied, modified, or distributed except according to those terms.

Dependencies

~0–5MB