1 unstable release
0.0.1 | Oct 10, 2024 |
---|
#44 in #llama
Used in 2 crates
68KB
2K
SLoC
infa
Rust + CUDA = Fast and simple inference library from scratch
requirements
Linux computer with CUDA 12~, cublas, rust installed.
You need at least sm_80
micro architecture. (This is hardcoded for now.)
compared to pytorch and llama.cpp
WIP
roadmap
Our first goal is to support bloat16 Llama 3.2 1B inference.
Dependencies
~0.8–1.4MB
~28K SLoC