2 unstable releases

0.2.0 Aug 9, 2022
0.1.0 Sep 30, 2021

#103 in Graphics APIs

Download history 6158/week @ 2023-12-13 5676/week @ 2023-12-20 3636/week @ 2023-12-27 9218/week @ 2024-01-03 11105/week @ 2024-01-10 13994/week @ 2024-01-17 13603/week @ 2024-01-24 11820/week @ 2024-01-31 12682/week @ 2024-02-07 12459/week @ 2024-02-14 12709/week @ 2024-02-21 11163/week @ 2024-02-28 9037/week @ 2024-03-06 12895/week @ 2024-03-13 11380/week @ 2024-03-20 9672/week @ 2024-03-27

44,959 downloads per month
Used in 53 crates (6 directly)



ec-gpu & ec-gpu-gen

crates.io Documentation Build Status minimum rustc 1.51 dependency status

crates.io Documentation Build Status minimum rustc 1.51 dependency status

CUDA/OpenCL code generator for finite-field arithmetic over prime fields and elliptic curve arithmetic constructed with Rust.


  • Limbs are 32/64-bit long, by your choice (on CUDA only 32-bit limbs are supported).
  • The library assumes that the most significant bit of your prime-field is unset. This allows for cheap reductions.



Generating CUDA/OpenCL codes for blstrs Scalar elements:

use blstrs::Scalar;
use ec_gpu_gen::SourceBuilder;

let source = SourceBuilder::new()

Integration into your library

This crate usually creates GPU kernels at compile-time. CUDA generates a fatbin, which OpenCL only generates the source code, which is then compiled at run-time.

In order to make things easier to use, there are helper functions available. You would put some code into build.rs, that generates the kernels, and some code into your library which then consumes those generated kernels. The kernels will be directly embedded into your program/library. If something goes wrong, you will get an error at compile-time.

In this example we will make use of the FFT functionality. Add to your build.rs:

use blstrs::Scalar;
use ec_gpu_gen::SourceBuilder;

fn main() {
    let source_builder = SourceBuilder::new().add_fft::<Scalar>()

The ec_gpu_gen::generate() takes care of the actual code generation/compilation. It will automatically create a CUDA and/or OpenCL kernel. It will define two environment variables, which are meant for internal use. _EC_GPU_CUDA_KERNEL_FATBIN that points to the compiled CUDA kernel, and _EC_GPU_OPENCL_KERNEL_SOURCE that points to the generated OpenCL source.

Those variables are then picked up by the ec_gpu_gen::program!() macro, which generates a program, for a given GPU device. Using FFT within your library would then look like this:

use ec_gpu_gen::{

let devices = Device::all();
let programs = devices
    .map(|device| ec_gpu_gen::program!(device))
    .collect::<Result<_, _>>()
    .expect("Cannot create programs!");

let mut kern = FftKernel::<Fr>::create(programs).expect("Cannot initialize kernel!");
kern.radix_fft_many(&mut [&mut coeffs], &[omega], &[log_d]).expect("GPU FFT failed!");

Feature flags

This crate supports CUDA and OpenCL, which can be enabled with the cuda and opencl feature flags.

Environment variables


    By default the CUDA kernel is compiled for several architectures, which may take a long time. EC_GPU_CUDA_NVCC_ARGS can be used to override those arguments. The input and output file will still be automatically set.

    // Example for compiling the kernel for only the Turing architecture.
    EC_GPU_CUDA_NVCC_ARGS="--fatbin --gpu-architecture=sm_75 --generate-code=arch=compute_75,code=sm_75"

    When the library is built with both CUDA and OpenCL support, you can choose which one to use at run time. The default is cuda, when you set nothing or any other (invalid) value. The other possible value is opencl.

    // Example for setting it to OpenCL.

    Restricts the number of threads used in the library. The default is set to the number of logical cores reported on the machine.

    // Example for setting the maximum number of threads to 6.


Licensed under either of

at your option.


Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

No runtime deps