1 unstable release
Uses new Rust 2024
| 0.1.0 | Aug 1, 2025 |
|---|
#1390 in Machine learning
3KB
llama.rust
LLM inference in Rust
Inference
cargo run --release -p llama-rust -- --model "meta-llama/Llama-2-7b-hf" \
--prompt "What is the capital of France?" --max-tokens 20 --temperature 0.7 --cpu
Parameters
--model: Path to the Hugging Face model ID.--prompt: The prompt to use for inference.--max-tokens: The maximum number of tokens to generate.--temperature: The temperature to use for sampling.
References
This project draws inspiration from: