#transcription #transcribe #whisper #openai

simple_transcribe_rs

Audio to text transcription library written in rust that utilizes whisper-rs bindings

6 releases (stable)

1.0.3 Jan 5, 2024
1.0.2 Jan 4, 2024
0.1.0 Jan 4, 2024
0.0.0 Jan 5, 2024

#312 in Audio

50 downloads per month

MIT license

2.5MB
256 lines

SimpleTranscribe-rs 🔈 📖

An audio to text transcription library written in rust that utilizes Whisper-rs bindings.

What is SimpleTranscribe-rs?

SimpleTranscribe-rs is a library written in Rust with the goal of making audio to text transcription simple for developers. SimpleTranscribe-rs handles different aspects of setting up audio to text transcription, such as automatically downloading required whisper text-to-speech models. The aim is for developers to be able to incorporate transcription in their projects quickly 🌩ī¸

Features

  • Automatically downloads Models that have no already been installed. Supported models:

    • Tiny
    • Base
    • Small
    • Medium
    • Large
  • Transcribes audio from different file types such as:

    • mp3
    • wav

Getting started

To use SimpleTranscribe-rs, simply add it to your project's cargo.toml:

[dependencies]
simple_transcribe_rs = "1.0.1"
tokio = { version = "1.35.1", features = ["full"] }

Due to the nature of downloading models, it is necessary to await instantiations of the model handler. Therefore an async runtime is required. Tokio is what is used internally in the library and has also been tested with, and therefore is the recommended runtime for this library.

Usage

To use SimpleTranscribe-rs, the model handler first needs to be used to setup and prepare the language model. Afterwards, the transcriber can be used to convert audio files to text. The following snippet depicts an example of this:

use simple_transcribe_rs::model_handler;
use simple_transcribe_rs::transcriber;

#[tokio::main]
async fn main() {
    let m = model_handler::ModelHandler::new("tiny", "models/").await;
    let trans = transcriber::Transcriber::new(m);
    let result = trans.transcribe("src/test_data/test.mp3", None).unwrap();
    let text = result.get_text();
    let start = result.get_start_timestamp();
    let end = result.get_end_timestamp();
    println!("start[{}]-end[{}] {}", start, end, text);
}

The snippet can be run via: cargo run --example usage_example

Dependencies

~14–28MB
~481K SLoC