#onnx #text-to-speech #asr #nvidia

parakeet-rs

Fast ASR & Speaker Diarization with NVIDIA Parakeet via ONNX

20 releases

new 0.2.9 Jan 11, 2026
0.2.8 Jan 8, 2026
0.2.7 Dec 26, 2025
0.2.5 Nov 25, 2025
0.1.6 Oct 22, 2025

#107 in Audio

Download history 442/week @ 2025-10-14 491/week @ 2025-10-21 204/week @ 2025-10-28 24/week @ 2025-11-04 5/week @ 2025-11-11 4/week @ 2025-11-18 23/week @ 2025-12-02 23/week @ 2025-12-09 35/week @ 2025-12-16 137/week @ 2025-12-23 63/week @ 2025-12-30 195/week @ 2026-01-06

434 downloads per month
Used in 2 crates

MIT/Apache

165KB
3.5K SLoC

parakeet-rs

Rust crates.io

Fast speech recognition with NVIDIA's Parakeet models via ONNX Runtime. Note: CoreML doesn't stable with this model - stick w/ CPU (or other GPU EP). But its incredible fast in my Mac M3 16gb' CPU compared to Whisper metal! :-)

Models

CTC (English-only):

use parakeet_rs::Parakeet;

let mut parakeet = Parakeet::from_pretrained(".", None)?;
let result = parakeet.transcribe_file("audio.wav")?;
println!("{}", result.text);

// Or transcribe in-memory audio
// let result = parakeet.transcribe_samples(audio, 16000, 1)?;

// Token-level timestamps
for token in result.tokens {
    println!("[{:.3}s - {:.3}s] {}", token.start, token.end, token.text);
}

TDT (Multilingual): 25 languages with auto-detection

use parakeet_rs::ParakeetTDT;

let mut parakeet = ParakeetTDT::from_pretrained("./tdt", None)?;
let result = parakeet.transcribe_file("audio.wav")?;
println!("{}", result.text);

// Or transcribe in-memory audio
// let result = parakeet.transcribe_samples(audio, 16000, 1)?;

// Token-level timestamps
for token in result.tokens {
    println!("[{:.3}s - {:.3}s] {}", token.start, token.end, token.text);
}

EOU (Streaming): Real-time ASR with end-of-utterance detection

use parakeet_rs::ParakeetEOU;

let mut parakeet = ParakeetEOU::from_pretrained("./eou", None)?;

// Prepare your audio (Vec<f32>, 16kHz mono, normalized)
let audio: Vec<f32> = /* your audio samples */;

// Process in 160ms chunks for streaming
const CHUNK_SIZE: usize = 2560; // 160ms at 16kHz
for chunk in audio.chunks(CHUNK_SIZE) {
    let text = parakeet.transcribe(chunk, false)?;
    print!("{}", text);
}

Nemotron (Streaming): Cache-aware streaming ASR with punctuation

use parakeet_rs::Nemotron;

let mut model = Nemotron::from_pretrained("./nemotron", None)?;

// Process in 560ms chunks for streaming
const CHUNK_SIZE: usize = 8960; // 560ms at 16kHz
for chunk in audio.chunks(CHUNK_SIZE) {
    let text = model.transcribe_chunk(chunk)?;
    print!("{}", text);
}

Sortformer v2 & v2.1 (Speaker Diarization): Streaming 4-speaker diarization

parakeet-rs = { version = "0.2", features = ["sortformer"] }
use parakeet_rs::sortformer::{Sortformer, DiarizationConfig};

let mut sortformer = Sortformer::with_config(
    "diar_streaming_sortformer_4spk-v2.onnx", // or v2.1.onnx
    None,
    DiarizationConfig::callhome(),  // or dihard3(),custom()
)?;
let segments = sortformer.diarize(audio, 16000, 1)?;
for seg in segments {
    println!("Speaker {} [{:.2}s - {:.2}s]", seg.speaker_id, seg.start, seg.end);
}

See examples/diarization.rs for combining with TDT transcription.

Setup

CTC: Download from HuggingFace: model.onnx, model.onnx_data, tokenizer.json

TDT: Download from HuggingFace: encoder-model.onnx, encoder-model.onnx.data, decoder_joint-model.onnx, vocab.txt

EOU: Download from HuggingFace: encoder.onnx, decoder_joint.onnx, tokenizer.json

Nemotron: Download from HuggingFace: encoder.onnx, encoder.onnx.data, decoder_joint.onnx, tokenizer.model

Diarization (Sortformer v2 & v2.1): Download from HuggingFace: diar_streaming_sortformer_4spk-v2.onnx or v2.1.onnx.

Quantized versions available (int8). All files must be in the same directory.

GPU support (auto-falls back to CPU if fails):

parakeet-rs = { version = "0.1", features = ["cuda"] }  # or tensorrt, webgpu, directml, rocm, or other ort supported EPs (check cargo features)
use parakeet_rs::{Parakeet, ExecutionConfig, ExecutionProvider};

let config = ExecutionConfig::new().with_execution_provider(ExecutionProvider::Cuda);
let mut parakeet = Parakeet::from_pretrained(".", Some(config))?;

Features

Notes

  • Audio: 16kHz mono WAV (16-bit PCM or 32-bit float)

License

Code: MIT OR Apache-2.0

FYI: The Parakeet ONNX models (downloaded separately from HuggingFace) are licensed under CC-BY-4.0 by NVIDIA. This library does not distribute the models.

Dependencies

~30MB
~507K SLoC