2 unstable releases
0.1.0 | May 30, 2019 |
---|---|
0.0.1 | May 24, 2019 |
#603 in Science
16KB
295 lines
Mel Frequency Cepstral Coefficients
A common pre-processing step in Machine Learning with audio signals is the application of a Mel Frequency Cepstral Coefficients (MFCC) transformation. They compress the signal to a very small number of coefficients (around 16 for every 10ms) and decorrelates the signal to express only the transmission function (e.g. only the formants of a utterance not the pitch). This makes them very popular in Automatic Speech Recognition (ASR), Room Classification, Speaker Recognition etc.
Usage
Add this to your Cargo.toml
[dependencies]
mfcc = "0.1"
The library can use two different FFT libraries. Either use rustfft
(a pure rust FFT implementation) with the standard feature fftrust or use fftw
(a popular FFT library) with
[dependencies.mfcc]
version = "0.1"
default-features = false
features = ["fftextern"]
A rough benchmark shows that their performance are comparable, for FFTW:
test tests::bench_mfcc ... bench: 123,959 ns/iter (+/- 22,979)
For rustfft:
test tests::bench_mfcc ... bench: 162,603 ns/iter (+/- 35,914)
How it works
First you need to segment you audio data in chunks of around 10ms-20ms (max 1024 samples for 48kHz). From these you can calculate the MFCC coefficients with
use mfcc::Transform;
let mut state = Transform::new(48000, 1024);
let mut output = vec![0.0; 16*3];
state.transform(&input, &mut output);
License
Licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Dependencies
~0.2–1.2MB
~23K SLoC