18 unstable releases (3 breaking)

new 0.3.9 Jul 7, 2025
0.3.8 Jul 4, 2025
0.3.7 Jun 25, 2025
0.3.5 May 22, 2025
0.0.1 Feb 8, 2025

#1533 in Machine learning

Download history 301/week @ 2025-03-22 288/week @ 2025-03-29 63/week @ 2025-04-05 107/week @ 2025-04-12 27/week @ 2025-04-19 8/week @ 2025-04-26 75/week @ 2025-05-03 145/week @ 2025-05-10 484/week @ 2025-05-17 81/week @ 2025-05-24 113/week @ 2025-05-31 20/week @ 2025-06-07 1/week @ 2025-06-14 117/week @ 2025-06-21 109/week @ 2025-06-28 212/week @ 2025-07-05

440 downloads per month
Used in rstsr

Apache-2.0

1.5MB
38K SLoC

RSTSR OpenBLAS device

This crate enables OpenBLAS device.

Usage

use rstsr_core::prelude::*;
use rstsr_openblas::DeviceOpenBLAS;

// specify the number of threads of 16
let device = DeviceOpenBLAS::new(16);
// if you want to use the default number of threads, use the following line
// let device = DeviceOpenBLAS::default();

let a = rt::linspace((0.0, 1.0, 1048576, &device)).into_shape([16, 256, 256]);
let b = rt::linspace((1.0, 2.0, 1048576, &device)).into_shape([16, 256, 256]);

// by optimized BLAS, the following operation is very fast
let c = &a % &b;

// mean of all elements is also performed in parallel
let c_mean = c.mean_all();

println!("{:?}", c_mean);
assert!((c_mean - 213.2503660477036) < 1e-6);

Important Notes

  • We do not provide automatic linkage:

    • Please add -l openblas in RUSTFLAGS, or cargo:rustc-link-lib=openblas in build.rs, or something similar, to your project. We do not use external FFI crates blas or blas-sys, and do not automatically search OpenBLAS library for linking.
    • If feature openmp activated, please add -l gomp or -l omp in RUSTFLAGS, or cargo:rustc-link-lib=gomp or cargo:rustc-link-lib=omp in build.rs, or something similar, to your project. We do not use external FFI crate openmp-sys, and do not automatically search for OpenMP library for linking.
  • If your OpenBLAS is compiled with OpenMP, please add openmp feature to either this crate or rstsr-openblas-ffi.

    • In our testing, OpenBLAS with OpenMP is probably more efficient than pthreads. However, we currently decided not make openmp as default feature.

Dependencies

~5–35MB
~509K SLoC