2 unstable releases
new 0.1.0 | Feb 10, 2025 |
---|---|
0.0.1 | Feb 8, 2025 |
#554 in Science
102 downloads per month
Used in rstsr
2MB
57K
SLoC
RSTSR OpenBLAS device
This crate enables OpenBLAS device.
Usage
use rstsr_core::prelude::*;
use rstsr_openblas::device::DeviceOpenBLAS;
// specify the number of threads of 16
let device = DeviceOpenBLAS::new(16);
// if you want to use the default number of threads, use the following line
// let device = DeviceOpenBLAS::default();
let a = rt::linspace((0.0, 1.0, 1048576, &device)).into_shape([16, 256, 256]);
let b = rt::linspace((1.0, 2.0, 1048576, &device)).into_shape([16, 256, 256]);
// by optimized BLAS, the following operation is very fast
let c = &a % &b;
// mean of all elements is also performed in parallel
let c_mean = c.mean_all();
println!("{:?}", c_mean);
assert!((c_mean - 213.2503660477036) < 1e-6);
Important Notes
-
We do not provide automatic linkage:
- Please add
-l openblas
inRUSTFLAGS
, orcargo:rustc-link-lib=openblas
in build.rs, or something similar, to your project. We do not use external FFI cratesblas
orblas-sys
, and do not automatically search OpenBLAS library for linking. - If feature
openmp
activated, please add-l gomp
or-l omp
inRUSTFLAGS
, orcargo:rustc-link-lib=gomp
orcargo:rustc-link-lib=omp
in build.rs, or something similar, to your project. We do not use external FFI crateopenmp-sys
, and do not automatically search for OpenMP library for linking.
- Please add
-
If your OpenBLAS is compiled with OpenMP, please add
openmp
feature to either this crate orrstsr-openblas-ffi
.- In our testing, OpenBLAS with OpenMP is probably more efficient than pthreads. However, we currently decided not make
openmp
as default feature.
- In our testing, OpenBLAS with OpenMP is probably more efficient than pthreads. However, we currently decided not make
Dependencies
~2.5–3.5MB
~78K SLoC