2 releases
Uses new Rust 2024
| 0.1.1 | Sep 23, 2025 |
|---|---|
| 0.1.0 | Sep 23, 2025 |
#78 in #ollama
58KB
116 lines
howfast
A small CLI tool that measures token metrics (prompt tokens, completion tokens, total tokens, and tokens-per-second) for an Ollama model response. It queries an Ollama server, then prints a nicely formatted, colored summary. The actual model text response is hidden by default; pass --with-response to show it.

Installation
cargo install howfast
Features
- Call an Ollama model and collect token metrics.
- See Tokens per second (completion)
- Nicely formatted, colored terminal output (no box/borders).
- Response hidden by default; opt-in with
--with-response. - Uses Tokio runtime (required by
ollama-rs).
Requirements
- Rust (1.70+ recommended)
- Cargo
- An Ollama server reachable from your machine (default
localhost:11434)
Environment
OLLAMA_HOST(optional) — host address of the Ollama server (defaults tolocalhost). Do not include the port; the program always uses port11434.
Example:
# set before running, if your Ollama server isn't on localhost
export OLLAMA_HOST=192.168.1.100
or
OLLAMA_HOST=192.168.1.100 howfast gemma3:4b "Tell me a joke"
Dependencies
~7–18MB
~317K SLoC