#search #similarity #audio #vector #machine #vector-database #path

nightly bin+lib audio-similarity-search

Performs similarity search for audio files on a local machine

2 unstable releases

0.2.0 Sep 14, 2024
0.1.0 Jun 28, 2024

#1865 in Database interfaces

27 downloads per month

GPL-3.0 license

36KB
603 lines

Audio Similarity Search in Rust

This repo contains a binary CLI and static library for running similarity search across audio files.

Building

To build the CLI, run:

  • cargo build to build in debug
  • cargo build --release to build in release

This will create an executable called similarity-search in the target/release or target/debug directory.

Running the CLI

The CLI includes help documentation - just run ./audio-similarity-search --help.

Commands:

  • analyze: run analysis on the provided directory. Builds a vector database that can be queried to find similar samples using the search command
  • search: run similarity search for a given sample
  • list: lists all analyzed sample paths and their IDs. Optional accepts a LIMIT uint parameter to limit the number or result returned.

Implementation Details

Feature extraction

The feature extraction phase walks over all of the wav and mp3 files found in the asset directory passed to the CLI during build. Each audio file is decoded, downsampled to 22050 Hz, and summed to mono. The resulting audio buffer is then chunked into blocks of 2048 samples, which are passed to aubio to perform an FFT, then an MFCC to distill the buffer down to a 13 dimensional MFCC vector. For each file, the MFCCs from each block are then averaged, resulting in a single 13-element feature vector. This feature extraction process is highly parallelized. It uses a thread pool to fan distribute the feature extraction for each file across all physical cores on the machine.

Note: this isn't perfect! Temporal infomation is lost when the MFCCs are averaged, which affects the quality of the similarity search results. It's on my todo list to revisit this.

Database creation and querying

The arroy database is used to store the feature vectors and perform similarity search. This project is a Rust port of the annoy C++/Python library from Spotify, which is used for fast approximate nearest neighbor search. Arroy differs slightly in that it is backed by LMDB, a high performance, memory mapped database. arroy/LMDB are taking care of all of the details for index creation and ANN search.

Since arroy only stores IDs and vectors, a SQLite database is used to associate file IDs with their paths and feature vectors. This metadata database is used to hydrate similarity search results to include file paths. Arroy has an open issue where appending new vectors does not work. To allow clients to append to the existing arroy db efficiently, we re-insert the cached vectors from the metadata db into arroy when analyzing a new directory of audio files.

Dependencies

~35–72MB
~1M SLoC