1 unstable release
0.0.1 | Oct 10, 2024 |
---|
#45 in #llama
Used in 2 crates
36KB
969 lines
infa
Rust + CUDA = Fast and simple inference library from scratch
requirements
Linux computer with CUDA 12~, cublas, rust installed.
You need at least sm_80
micro architecture. (This is hardcoded for now.)
compared to pytorch and llama.cpp
WIP
roadmap
Our first goal is to support bloat16 Llama 3.2 1B inference.
Dependencies
~0.5–1.1MB
~23K SLoC