1 unstable release
0.0.1 | Oct 10, 2024 |
---|
#44 in #llama
Used in infa
26KB
665 lines
infa
Rust + CUDA = Fast and simple inference library from scratch
requirements
Linux computer with CUDA 12~, cublas, rust installed.
You need at least sm_80
micro architecture. (This is hardcoded for now.)
compared to pytorch and llama.cpp
WIP
roadmap
Our first goal is to support bloat16 Llama 3.2 1B inference.
Dependencies
~0.6–1.1MB
~24K SLoC