15 releases
| new 0.3.0 | Mar 9, 2026 |
|---|---|
| 0.2.9 | Mar 4, 2026 |
| 0.2.5 | Feb 17, 2026 |
| 0.2.2 | Jan 14, 2026 |
| 0.1.0 | Sep 10, 2025 |
#108 in Machine learning
3,082 downloads per month
Used in whis-core
3MB
3.5K
SLoC
transcribe-rs
Multi-engine speech-to-text library for Rust. Supports Parakeet, Moonshine, SenseVoice, GigaAM, Whisper, Whisperfile, and OpenAI.
Breaking Changes in 0.3.0
Version 0.3.0 changes the SpeechModel trait. If you need the old API, pin to version = "=0.2.9".
transcribe()andtranscribe_file()now take&TranscribeOptionsinstead ofOption<&str>for languageSpeechModelrequiresSend, enablingBox<dyn SpeechModel + Send>across threadsTranscribeOptionsincludes atranslatefield for Whisper/Whisperfile translation supportWhisperEngine::capabilities()now returns actual model language support (English-only vs multilingual) instead of always reporting 99 languages
Note: 0.3.0 is a large migration. We believe correctness is preserved for all engines, but expect potential issues as this stabilizes. Please report any problems on GitHub.
Installation
[dependencies]
transcribe-rs = { version = "0.3", features = ["onnx"] }
No features are enabled by default. Pick the engines you need:
| Feature | Engines |
|---|---|
onnx |
Parakeet, Moonshine, SenseVoice, GigaAM (via ONNX Runtime) |
whisper-cpp |
Whisper (local, GGML via whisper.cpp with Metal/Vulkan) |
whisperfile |
Whisperfile (local server wrapper) |
openai |
OpenAI API (remote, async) |
all |
Everything above |
Quick Start
use transcribe_rs::onnx::parakeet::{ParakeetModel, ParakeetParams, TimestampGranularity};
use transcribe_rs::onnx::Quantization;
use std::path::PathBuf;
let mut model = ParakeetModel::load(
&PathBuf::from("models/parakeet-tdt-0.6b-v3-int8"),
&Quantization::Int8,
)?;
let samples = transcribe_rs::audio::read_wav_samples(&PathBuf::from("audio.wav"))?;
let result = model.transcribe_with(
&samples,
&ParakeetParams {
timestamp_granularity: Some(TimestampGranularity::Segment),
..Default::default()
},
)?;
println!("{}", result.text);
All local engines implement the SpeechModel trait. Remote engines (OpenAI) implement RemoteTranscriptionEngine separately because they are async and file-based.
Usage by Engine
SenseVoice
use transcribe_rs::onnx::sense_voice::{SenseVoiceModel, SenseVoiceParams};
use transcribe_rs::onnx::Quantization;
use std::path::PathBuf;
let mut model = SenseVoiceModel::load(
&PathBuf::from("models/sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17"),
&Quantization::Int8,
)?;
let samples = transcribe_rs::audio::read_wav_samples(&PathBuf::from("audio.wav"))?;
let result = model.transcribe_with(
&samples,
&SenseVoiceParams {
language: Some("en".to_string()),
..Default::default()
},
)?;
Moonshine
use transcribe_rs::onnx::moonshine::{MoonshineModel, MoonshineVariant};
use transcribe_rs::onnx::Quantization;
use transcribe_rs::SpeechModel;
use std::path::PathBuf;
let mut model = MoonshineModel::load(
&PathBuf::from("models/moonshine-base"),
MoonshineVariant::Base,
&Quantization::default(),
)?;
let result = model.transcribe_file(&PathBuf::from("audio.wav"), &transcribe_rs::TranscribeOptions::default())?;
Streaming variant:
use transcribe_rs::onnx::moonshine::StreamingModel;
use transcribe_rs::onnx::Quantization;
use transcribe_rs::SpeechModel;
use std::path::PathBuf;
let mut model = StreamingModel::load(
&PathBuf::from("models/moonshine-streaming/moonshine-tiny-streaming-en"),
4, // threads
&Quantization::default(),
)?;
let result = model.transcribe_file(&PathBuf::from("audio.wav"), &transcribe_rs::TranscribeOptions::default())?;
GigaAM
use transcribe_rs::onnx::gigaam::GigaAMModel;
use transcribe_rs::onnx::Quantization;
use transcribe_rs::SpeechModel;
use std::path::PathBuf;
let mut model = GigaAMModel::load(
&PathBuf::from("models/giga-am-v3"),
&Quantization::default(),
)?;
let result = model.transcribe_file(&PathBuf::from("audio.wav"), &transcribe_rs::TranscribeOptions::default())?;
Whisper (whisper.cpp)
use transcribe_rs::whisper_cpp::{WhisperEngine, WhisperInferenceParams};
use std::path::PathBuf;
let mut engine = WhisperEngine::load(&PathBuf::from("models/whisper-medium-q4_1.bin"))?;
let samples = transcribe_rs::audio::read_wav_samples(&PathBuf::from("audio.wav"))?;
let result = engine.transcribe_with(
&samples,
&WhisperInferenceParams {
initial_prompt: Some("Context prompt here.".to_string()),
..Default::default()
},
)?;
Whisperfile
use transcribe_rs::whisperfile::{
WhisperfileEngine, WhisperfileInferenceParams, WhisperfileLoadParams,
};
use std::path::PathBuf;
let mut engine = WhisperfileEngine::load_with_params(
&PathBuf::from("models/whisperfile-0.9.3"),
&PathBuf::from("models/ggml-small.bin"),
WhisperfileLoadParams {
port: 8080,
startup_timeout_secs: 60,
..Default::default()
},
)?;
let samples = transcribe_rs::audio::read_wav_samples(&PathBuf::from("audio.wav"))?;
let result = engine.transcribe_with(
&samples,
&WhisperfileInferenceParams {
language: Some("en".to_string()),
..Default::default()
},
)?;
// Server shuts down automatically when engine is dropped.
OpenAI (Remote)
use transcribe_rs::remote::openai::{self, OpenAIModel, OpenAIRequestParams};
use transcribe_rs::{remote, RemoteTranscriptionEngine};
use std::path::PathBuf;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let engine = openai::default_engine();
let result = engine
.transcribe_file(
&PathBuf::from("audio.wav"),
OpenAIRequestParams::builder()
.model(OpenAIModel::Gpt4oMiniTranscribe)
.timestamp_granularity(remote::openai::OpenAITimestampGranularity::Segment)
.build()?,
)
.await?;
println!("{}", result.text);
Ok(())
}
Models
All audio input must be 16 kHz, mono, 16-bit PCM WAV.
Model Downloads
| Engine | Download |
|---|---|
| Parakeet (int8) | blob.handy.computer / HuggingFace |
| SenseVoice (int8) | blob.handy.computer / sherpa-onnx |
| Moonshine | HuggingFace |
| GigaAM | HuggingFace |
| Whisper (GGML) | HuggingFace |
| Whisperfile binary | GitHub |
Directory Layouts
Parakeet (directory):
models/parakeet-tdt-0.6b-v3-int8/
├── encoder-model.int8.onnx
├── decoder_joint-model.int8.onnx
├── nemo128.onnx
└── vocab.txt
SenseVoice (directory):
models/sense-voice/
├── model.int8.onnx
└── tokens.txt
Moonshine (directory):
models/moonshine-base/
├── encoder_model.onnx
├── decoder_model_merged.onnx
└── tokenizer.json
Moonshine Streaming (directory):
models/moonshine-streaming/moonshine-tiny-streaming-en/
├── encoder.onnx
├── decoder.onnx
├── streaming_config.json
└── tokenizer.json
GigaAM (directory):
models/giga-am-v3/
├── model.onnx (or model.int8.onnx)
└── vocab.txt
Whisper: single file (e.g. whisper-medium-q4_1.bin).
Moonshine Variants
| Variant | Language |
|---|---|
| Tiny | English |
| TinyAr | Arabic |
| TinyZh | Chinese |
| TinyJa | Japanese |
| TinyKo | Korean |
| TinyUk | Ukrainian |
| TinyVi | Vietnamese |
| Base | English |
| BaseEs | Spanish |
Examples and Tests
Each engine has an example in examples/. Run with the appropriate feature flag:
cargo run --example parakeet --features onnx
cargo run --example sense_voice --features onnx
cargo run --example moonshine --features onnx
cargo run --example moonshine_streaming --features onnx
cargo run --example gigaam --features onnx
cargo run --example whisper --features whisper-cpp
cargo run --example whisperfile --features whisperfile
cargo run --example openai --features openai
Tests are also feature-gated. Models must be present locally; tests skip gracefully if not found.
cargo test --features onnx
cargo test --features whisper-cpp
cargo test --features whisperfile
cargo test --all-features
Whisperfile tests look for the binary at models/whisperfile-0.9.3 (override with WHISPERFILE_BIN) and model at models/ggml-small.bin (override with WHISPERFILE_MODEL). GigaAM tests require samples/russian.wav.
Development aliases from .cargo/config.toml:
cargo check-all # cargo check --all-features
cargo build-all # cargo build --all-features
cargo test-all # cargo test --all-features
Performance
Parakeet int8 benchmarks:
| Platform | Speed |
|---|---|
| MBP M4 Max | ~30x real-time |
| Zen 3 (5700X) | ~20x real-time |
| Skylake (i5-6500) | ~5x real-time |
| Jetson Nano CPU | ~5x real-time |
Acknowledgments
- istupakov for the ONNX Parakeet and GigaAM exports
- NVIDIA for Parakeet
- whisper.cpp
- jart / Mozilla AI for llamafile and Whisperfile
- UsefulSensors for Moonshine
- FunASR / sherpa-onnx for SenseVoice
- SberDevices for GigaAM
Dependencies
~4–24MB
~299K SLoC