5 unstable releases

Uses new Rust 2024

new 0.2.0-pre.1 Feb 9, 2026
0.1.1 Jan 23, 2026
0.1.0 Jan 15, 2026
0.1.0-pre.1 Dec 18, 2025
0.0.1 Dec 5, 2025

#2647 in Algorithms

Download history 5/week @ 2025-12-01 295/week @ 2025-12-15 624/week @ 2025-12-22 336/week @ 2025-12-29 530/week @ 2026-01-05 1274/week @ 2026-01-12 2132/week @ 2026-01-19 4534/week @ 2026-01-26 5261/week @ 2026-02-02

13,230 downloads per month
Used in 33 crates (via cubek)

MIT/Apache

165KB
4K SLoC

CubeK Reduce

Implements a wide variety of reduction algorithms across multiple instruction sets and hardware targets for efficient tensor reduction.

Running Tests

Important Environment Variables

Two environment variables control test execution behavior:

  • CUBEK_TEST_MODE
    Controls handling of tests that cannot run on the current hardware (e.g., due to missing support for certain algorithms).

    • skip (default): Skipped tests are silently ignored and reported as passed by the Rust test runner.
    • verbose: Skipped tests are reported with an explanation why they were skipped, but still marked as passed.
    • panic: Skipped tests cause a failure, printing the reason. The test run will show failures.
      Useful for discovering which tests are being skipped on your hardware.
  • CUBEK_TEST_FULL
    Controls whether time-consuming tests are executed.

    • 0 (default): Long-running tests are skipped with an explanatory message.
    • 1: All tests are run, including the longer ones.

Important Feature Flags

The test suite can be run on different CubeCL runtimes by enabling the corresponding feature flag.

Examples

# Run all tests (including long ones) on the CUDA runtime, skipping unsupported tests silently
CUBEK_TEST_FULL=1 cargo test --features cubecl/cuda

# Run all tests on CUDA, failing on any unsupported tests (to see what is skipped)
CUBEK_TEST_MODE=panic CUBEK_TEST_FULL=1 cargo test --features cubecl/cuda

# Run tests on the WGSL (web GPU) runtime with verbose skipping
CUBEK_TEST_MODE=verbose cargo test --features cubecl/wgsl

Dependencies

~60–100MB
~2M SLoC