1 unstable release
0.1.0 | Sep 29, 2024 |
---|
#256 in Audio
Used in voice-stream
66KB
1K
SLoC
Earshot
Ridiculously fast, only slightly bad voice activity detection in pure Rust. Port of the famous WebRTC VAD.
Features
#![no_std]
, doesn't even requirealloc
- Internal buffers can get pretty big when stored on the stack, so the
alloc
feature is enabled by default, which allocates them on the heap instead.
- Internal buffers can get pretty big when stored on the stack, so the
- Stupidly fast; uses only fixed-point arithmetic
- Achieves an RTF of ~3e-4 with 30 ms 48 KHz frames, ~3e-5 with 30 ms 8 KHz frames.
- Comparatively, Silero VAD v4 w/
ort
achieves an RTF of ~3e-3 with 60 ms 16 KHz frames.
- Okay accuracy
- Great at distinguishing between silence and noise, but not between noise and speech.
- Earshot provides alternative models with slight accuracy gains compared to the base WebRTC model.
lib.rs
:
Earshot is a fast voice activity detection library.
For more details, see VoiceActivityDetector
.
use earshot::{VoiceActivityDetector, VoiceActivityProfile};
let mut vad = VoiceActivityDetector::new(VoiceActivityProfile::VERY_AGGRESSIVE);
while let Some(frame) = stream.next() {
let is_speech_detected = vad.predict_16khz(&frame).unwrap();
# assert_eq!(is_speech_detected, false);
}