#opencl #gpu #gpgpu


A Rust implementation of the Khronos OpenCL 3.0 API and extensions

17 releases (4 breaking)

new 0.5.2 Sep 19, 2021
0.4.1 Aug 21, 2021
0.3.0 Jul 10, 2021
0.1.4 Mar 26, 2021
0.1.0 Dec 31, 2020

#32 in Asynchronous

Download history 12/week @ 2021-06-01 26/week @ 2021-06-08 22/week @ 2021-06-15 22/week @ 2021-06-22 57/week @ 2021-06-29 57/week @ 2021-07-06 9/week @ 2021-07-13 24/week @ 2021-07-20 83/week @ 2021-07-27 110/week @ 2021-08-03 300/week @ 2021-08-10 309/week @ 2021-08-17 170/week @ 2021-08-24 180/week @ 2021-08-31 482/week @ 2021-09-07 482/week @ 2021-09-14

743 downloads per month
Used in 11 crates (via rust-gpu-tools)


5.5K SLoC


crates.io docs.io OpenCL 3.0 License Rust

A Rust implementation of the Khronos OpenCL API.


This crate provides a relatively simple, object based model of the OpenCL 3.0 API.
It is built upon the cl3 crate, which provides a functional interface to the OpenCL API.

OpenCL (Open Computing Language) is framework for general purpose parallel programming across heterogeneous devices including: CPUs, GPUs, DSPs, FPGAs and other processors or hardware accelerators.

It is often considered as an open-source alternative to Nvidia's proprietary Compute Unified Device Architecture CUDA for performing General-purpose computing on GPUs, see GPGPU.

The OpenCL Specification has evolved over time and not all device vendors support all OpenCL features.

OpenCL 3.0 is a unified specification that adds little new functionality to previous OpenCL versions.
It specifies that all OpenCL 1.2 features are mandatory, while all OpenCL 2.x and 3.0 features are now optional.

OpenCL 2.x and 3.0 optional features include:

  • Shared Virtual Memory (SVM),
  • nested parallelism,
  • pipes
  • atomics
  • and a generic address space,


The tests provide examples of how the crate may be used, e.g. see: platform, device, context, integration_test and opencl2_kernel_test.

The library is designed to support events and OpenCL 2 features such as Shared Virtual Memory (SVM) and kernel built-in work-group functions, e.g.:

const PROGRAM_SOURCE: &str = r#"
kernel void inclusive_scan_int (global int* output,
                                global int const* values)
    int sum = 0;
    size_t lid = get_local_id(0);
    size_t lsize = get_local_size(0);

    size_t num_groups = get_num_groups(0);
    for (size_t i = 0u; i < num_groups; ++i)
        size_t lidx = i * lsize + lid;
        int value = work_group_scan_inclusive_add(values[lidx]);
        output[lidx] = sum + value;

        sum += work_group_broadcast(value, lsize - 1);

const KERNEL_NAME: &str = "inclusive_scan_int";

// Create a Context on an OpenCL device
let context = Context::from_device(&device).expect("Context::from_device failed");

// Build the OpenCL program source and create the kernel.
let program = Program::create_and_build_from_source(&context, PROGRAM_SOURCE, CL_STD_2_0)
    .expect("Program::create_and_build_from_source failed");
let kernel = Kernel::create(&program, KERNEL_NAME).expect("Kernel::create failed");

// Create a command_queue on the Context's device
let queue = CommandQueue::create_with_properties(
.expect("CommandQueue::create_with_properties failed");

// The input data
const ARRAY_SIZE: usize = 8;
let value_array: [cl_int; ARRAY_SIZE] = [3, 2, 5, 9, 7, 1, 4, 2];

// Create an OpenCL SVM vector
let mut test_values =SvmVec::<cl_int>::allocate(&context, ARRAY_SIZE)
    .expect("SVM allocation failed");

// Map test_values if not a CL_MEM_SVM_FINE_GRAIN_BUFFER
if !test_values.is_fine_grained() {
    queue.enqueue_svm_map(CL_BLOCKING, CL_MAP_WRITE, &mut test_values, &[])?;

// Copy input data into the OpenCL SVM vector

// Make test_values immutable
let test_values = test_values;

// Unmap test_values if not a CL_MEM_SVM_FINE_GRAIN_BUFFER
if !test_values.is_fine_grained() {
    let unmap_test_values_event = queue.enqueue_svm_unmap(&test_values, &[])?;

// The output data, an OpenCL SVM vector
let mut results = SvmVec::<cl_int>::allocate(&context, ARRAY_SIZE)
    .expect("SVM allocation failed");

// Run the kernel on the input data
let kernel_event = ExecuteKernel::new(&kernel)

// Wait for the kernel to complete execution on the device

// Map results if not a CL_MEM_SVM_FINE_GRAIN_BUFFER
if !results.is_fine_grained() {
    queue.enqueue_svm_map(CL_BLOCKING, CL_MAP_READ, &mut results, &[])?;

// Can access OpenCL SVM directly, no need to map or read the results
println!("sum results: {:?}", results);

// Unmap results if not a CL_MEM_SVM_FINE_GRAIN_BUFFER
if !results.is_fine_grained() {
    let unmap_results_event = queue.enqueue_svm_unmap(&results, &[])?;


Ensure that an OpenCL Installable Client Driver (ICD) and the appropriate OpenCL hardware driver(s) are installed, see OpenCL Installation.

opencl3 supports OpenCL 1.2 and 2.0 ICD loaders by default. If you have an OpenCL 2.0 ICD loader then just add the following to your project's Cargo.toml:

opencl3 = "0.5"

If your OpenCL ICD loader supports higher versions of OpenCL then add the appropriate features to opencl3, e.g. for an OpenCL 3.0 ICD loader add the following to your project's Cargo.toml instead:

version = "0.5"
features = ["CL_VERSION_2_1", "CL_VERSION_2_2", "CL_VERSION_3_0"]

For examples on how to use the library see the integration tests in integration_test.rs

See OpenCL Description for background on using OpenCL.

Recent changes

The API has changed considerably since version 0.1 of the library, with the aim of making the library more consistent and easier to use.

The most recent change is to SvmVec to provide better support for coarse grain buffer Shared Virtual Memory now that Nvidia has started supporting it, see Nvidia OpenCL.

Context no longer contains: Programs, Kernels and Command Queues. They must now be built separately, as shown in the example above.

It is now recommended to call the Program::create_and_build_from_* methods to build programs since they will return the build log if there is a build failure.

The OpenCL function calls now return an error type with a Display trait that shows the name of the OpenCL error, not just its number.

The Event API now returns CommandExecutionStatus and EventCommandType which also use the Display trait to display their names.

The API for memory structs: Buffer, Image and Pipe have been unified using the ClMem trait object.


Nearly all the structs implement the Drop trait to release their corresponding OpenCL objects, see the crate documentation.


The crate contains unit, documentation and integration tests.
The tests run the platform and device info functions (among others) so they can provide useful information about OpenCL capabilities of the system.

It is recommended to run the tests in single-threaded mode, since some of them can interfere with each other when run multi-threaded, e.g.:

cargo test -- --test-threads=1 --show-output

The integration tests are marked ignore so use the following command to run them:

cargo test -- --test-threads=1 --show-output --ignored


If you want to contribute through code or documentation, the Contributing guide is the best place to start. If you have any questions, please feel free to ask. Just please abide by our Code of Conduct.


Licensed under the Apache License, Version 2.0, as per Khronos Group OpenCL.
You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Any contribution intentionally submitted for inclusion in the work by you shall be licensed as defined in the Apache-2.0 license above, without any additional terms or conditions, unless you explicitly state otherwise.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.