16 releases (stable)
1.4.3 | Jun 20, 2024 |
---|---|
1.4.1 | Apr 25, 2024 |
1.4.0 | Mar 15, 2024 |
1.1.1 | Nov 8, 2023 |
0.3.0 | May 26, 2023 |
#154 in Encoding
414 downloads per month
Used in 5 crates
200KB
3.5K
SLoC
WaveRS - A Wav File Reader/Writer
WaveRs (pronounced wavers) is a Wav file reader/writer written in Rust and designed to fast and easy to use. WaveRs is also available in Python through the PyWaveRs package.
Getting Started · Benchmarks
Getting Started
This sections provides a quick overview of the functionality offered by WaveRs to help you get started quickly. WaveRs allows the user to read, write and perform conversions between different types of sampled audio, currently, i16
, i32
, f32
and f64
. There is now experimental support fori24
now.
For more details on the project and wav files see the WaveRs Project section below. For more detailed information on the functionality offered by WaveRs see the the docs.
Reading
use wavers::{Wav, read};
use std::path::Path;
fn main() {
let fp = "path/to/wav.wav";
// creates a Wav file struct, does not read the audio data. Just the header information.
let wav: Wav<i16> = Wav::from_path(fp).unwrap();
let samples: Samples<i16> = wav.read().unwrap();
// or to read the audio data directly
let (samples, sample_rate): (Samples<i16>, i32) = read::<i16, _>(fp).unwrap();
// samples can be derefed to a slice of samples
let samples: &[i16] = &samples;
}
Conversion
use wavers::{Wav, read, ConvertTo};
use std::path::Path;
fn main() {
// Two ways of converted a wav file
let fp: "./path/to/i16_encoded_wav.wav";
let wav: Wav<f32> = Wav::from_path(fp).unwrap();
// conversion happens automatically when you read
let samples: &[f32] = &wav.read().unwrap();
// or read and then call the convert function on the samples.
let (samples, sample_rate): (Samples<i16>, i32) = read::<i16, _>(fp).unwrap();
let samples: &[f32] = &samples.convert();
}
Writing
use wavers::Wav;
use std::path::Path;
fn main() {
let fp: &Path = &Path::new("path/to/wav.wav");
let out_fp: &Path = &Path::new("out/path/to/wav.wav");
// two main ways, read and write as the type when reading
let wav: Wav<i16> = Wav::from_path(fp).unwrap();
wav.write(out_fp).unwrap();
// or read, convert, and write
let (samples, sample_rate): (Samples<i16>,i32) = read::<i16, _>(fp).unwrap();
let sample_rate = wav.sample_rate();
let n_channels = wav.n_channels();
let samples: &[f32] = &samples.convert();
write(out_fp, samples, sample_rate, n_channels).unwrap();
}
Iteration
WaveRs
provides two primary methods of iteration: Frame-wise and Channel-wise. These can be performed using the Wav::frames
and Wav::channels
functions respectively. Both methods return an iterator over the samples in the wav file. The frames
method returns an iterator over the frames of the wav file, where a frame is a single sample from each channel. The channels
method returns an iterator over the channels of the wav file, where a channel is all the samples for a single channel.
use wavers::Wav;
fn main() {
let wav = Wav::from_path("path/to/two_channel.wav").unwrap();
for frame in wav.frames() {
assert_eq!(frame.len(), 2, "The frame should have two samples since the wav file has two channels");
// do something with the frame
}
for channel in wav.channels() {
// do something with the channel
assert_eq!(channel.len(), wav.n_samples() / wav.n_channels(), "The channel should have the same number of samples as the wav file divided by the number of channels");
}
}
Wav Utilities
use wavers::wav_spec;
fn main() {
let fp = "path/to/wav.wav";
let wav: Wav<i16> = Wav::from_path(fp).unwrap();
let sample_rate = wav.sample_rate();
let n_channels = wav.n_channels();
let duration = wav.duration();
let encoding = wav.encoding();
let (duration, header) = wav_spec(fp).unwrap();
}
Check out wav_inspect for a simnple command line tool to inspect the headers of wav files.
Features
The following section describes the features available in the WaveRs crate.
Ndarray
The ndarray
feature is used to provide functions that allow wav files to be read as ndarray
2-D arrays (samples x channels). There are two functions provided, into_ndarray
and as_ndarray
. Both functions create an Array2
from the samples and return it alongside the sample rate of the audio. into_ndarray
consume the input Wav
struct and as_ndarray
does not.
use wavers::{read, Wav, AsNdarray, IntoNdarray};
use ndarray::{Array2, CowArray2};
fn main() {
let fp = "path/to/wav.wav";
let wav: Wav<i16> = Wav::from_path(fp).unwrap();
// does not consume the wav file struct
let (i16_array, sample_rate): (Array2<i16>, i32) = wav.as_ndarray().unwrap();
// consumes the wav file struct
let (i16_array, sample_rate): (Array2<i16>, i32) = wav.into_ndarray().unwrap();
// convert the array to samples.
let samples: Samples<i16> = Samples::from(i16_array);
}
PyO3
The pyo3
feature is used to provide interoperabilty with the Python, specifically, the PyWavers project (Use Wavers in Python).
The WaveRs Project
There were several motivating factors when deciding to write Wavers. Firstly, my PhD involves quite a bit of audio processing and I have been working almost exclusively with wav files using Python. Python is a fantastic language but I have always had issues with aspects such as baseline memory usage. Secondly, after being interested in learning a more low-level language and not being bothered with the likes of C/C++ (again), Rust caught my attention. Thirdly, I had to do a Speech and Audio processing module, which involved a project. Mixing all of these together led me to start this project and gave me a deadline and goals for an MVP of Wavers.
Rust also has a limited number of audio-processing libraries and even fewer specific to the wav format. Currently, the most popular wav reader/writer crate is hound, by ruuda. Hound is currently used as the wav file reader/writer for other Rust audio libraries such as Rodio, but Hound was last updated in September 2022. The biggest general-purpose audio-processing library for Rust though is CPAL. CPAL, the Cross-Platform Audio Library is a low-level library for audio input and output in pure Rust. Finally, there is also Symphonia, another general-purpose media encoder/decoder crate for Rust. Symphonia supports the reading and writing of wav files. Both CPAL
and Symphonia
are very low level and do not offer the ease of use that WaveRs
strives for.
Project Goals
Anatomy of Wav File
The wave file format is a widely supported format for storing digital audio. A wave file uses the Resource Interchange File Format (RIFF) file structure and organizes the data in the file in chunks. Each chunk contains information about its type and size and can easily be skipped by software that does not understand the specific chunk type.
Excluding the RIFF chunk there is no guaranteed order to the remaining chunks and only the fmt and data chunk are guaranteed to exist. This means that rather than a specific structure for decoding chunks, chunks must be discovered by seeking through the wav file and reading the chunk ID and chunk size fields. Then if the chunk is needed, read the chunk according to the chunk format, and if not, skip ahead by the size of the chunk.
The chunk types supported and used by Wavers are described below. There are plans to expand the number of supported chunks as time goes on and the library matures.
RIFF
Total Bytes = 8 + 4 for the Chunk name + the size of the rest of the file less 8 bytes for the Chunk ID and size.
Byte Sequence Description | Length(Bytes) | Offset(Bytes) | Description |
---|---|---|---|
Chunk ID | 4 | 0 | The character string "RIFF" |
Size | 4 | 4 | The size, in bytes of the chunk |
RIFF type ID | 4 | 8 | The character string "WAVE" |
Chunks | $x$ | 12 | The other chunks |
fmt
Total bytes = 16 + 4 for the chunk size + 4 for the chunk name
Byte Sequence Description | Length(Bytes) | Offset(Bytes) | Description |
---|---|---|---|
Chunk ID | 4 | 0 | The character string "fmt " (note the space!) |
Size | 4 | 4 | The size, in bytes of the chunk |
Compression Code | 2 | 6 | Indicates the format of the wav file, e.g. PCM = 1 and IEEE Float = 3 |
Number of Channels | 2 | 8 | The number of channels in the wav file |
Sampling Rate | 4 | 12 | The rate at which the audio is sampled |
Byte Rate | 4 | 16 | Bytes per second |
Block Align | 2 | 18 | Minimum atomic unit of data |
Bits per Sample | 2 | 20 | The number of bits per sample, e.g. PCM_16 has 16 bits |
Data Chunk
Total bytes = 4 for the chunk name + 4 for the size and then $x$ bytes for the data.
Byte Sequence Description | Length(Bytes) | Offset(Bytes) | Description |
---|---|---|---|
Chunk ID | 4 | 0 | The character string "data" |
size | 4 | 4 | The size of the data in bytes |
data | $x$ | 8 | The encoded audio data |
Benchmarks
The benchmarks below were recorded using Criterion
and each benchmark was run on a small dataset of wav files taken from the GenSpeech data set. The durations vary between approx. 7s and 15s and each file is encoded as PCM_16. The results below are the time taken to load all the wav files in the data set. So the time per file is the total time divided by the number of files in the data set. The data set contains 10 files. There are some suspected anomalies in the benchmarks which warrant further investigation. The benchmarks were run on a desktop PC with the following (relevant) specs:
- CPU: 13th Gen Intel® Core™ i7-13700KF
- RAM: 32Gb DDR4
- Storage: 1Tb SSD
Hound vs Wavers - native i16
benchmark | name | min_time | mean_time | max_time |
---|---|---|---|---|
Hound vs Wavers - native i16 | Hound Read i16 | 7.4417 ms | 7.4441 ms | 7.4466 ms |
Hound vs Wavers - native i16 | Wavers Read i16 | 122.42 µs | 122.56 µs | 122.72 µs |
Hound vs Wavers - native i16 | Hound Write i16 | 2.1900 ms | 2.2506 ms | 2.3201ms |
Hound vs Wavers - native i16 | Wavers Write i16 | 5.9484 ms | 6.2091 ms | 6.5018 ms |
Reading
benchmark | name | min_time | mean_time | max_time |
---|---|---|---|---|
Reading | Native i16 - Read | 121.28 µs | 121.36 µs | 121.44 µs |
Reading | Native i16 - Read Wav File | 121.56 µs | 121.79 µs | 122.08 µs |
Reading | Native i16 As f32 | 287.63 µs | 287.78 µs | 287.97 µs |
Writing
benchmark | name | min_time | mean_time | max_time |
---|---|---|---|---|
Writing | Slice - Native i16 | 5.9484 ms | 6.2091 ms | 6.5018 ms |
Writing | Slice - Native i16 As f32 | 30.271 ms | 33.773 ms | 37.509 ms |
Writing | Write native f32 | 11.286 ms | 11.948 ms | 12.648 ms |
Dependencies
~0.5–9MB
~86K SLoC