#speaker #embedding #onnx-runtime #macos #diarization #pyannote

pyannote-rs

Speaker diarization using pyannote in Rust

21 releases

0.3.0-beta.0 Nov 30, 2024
0.2.9 Nov 29, 2024
0.2.7 Aug 18, 2024
0.1.9 Aug 7, 2024

#352 in Audio

Download history 194/week @ 2024-08-17 57/week @ 2024-08-24 15/week @ 2024-08-31 1/week @ 2024-09-07 26/week @ 2024-09-14 29/week @ 2024-09-21 39/week @ 2024-09-28 52/week @ 2024-10-05 29/week @ 2024-10-12 4/week @ 2024-10-19 2/week @ 2024-10-26 80/week @ 2024-11-09 30/week @ 2024-11-16 177/week @ 2024-11-23 191/week @ 2024-11-30

479 downloads per month

MIT license

16KB
232 lines

pyannote-rs

Crates License

Pyannote audio diarization in Rust

Features

  • Compute 1 hour of audio in less than a minute on CPU.
  • Faster performance with DirectML on Windows and CoreML on macOS.
  • Accurate timestamps with Pyannote segmentation.
  • Identify speakers with wespeaker embeddings.

Install

cargo add pyannote-rs

Usage

See Building

Examples

See examples

How it works

pyannote-rs uses 2 models for speaker diarization:

  1. Segmentation: segmentation-3.0 identifies when speech occurs.
  2. Speaker Identification: wespeaker-voxceleb-resnet34-LM identifies who is speaking.

Inference is powered by onnxruntime.

  • The segmentation model processes up to 10s of audio, using a sliding window approach (iterating in chunks).
  • The embedding model processes filter banks (audio features) extracted with knf-rs.

Speaker comparison (e.g., determining if Alice spoke again) is done using cosine similarity.

Credits

Big thanks to pyannote-onnx and kaldi-native-fbank

Dependencies

~3–10MB
~108K SLoC