#text-to-speech #self-contained #g2p #model #espeak #kokoro #tagger #perceptron #pos-aware #british

misaki-rs

A self-contained, POS-aware Grapheme-to-Phoneme (G2P) engine for Rust, optimized for TTS models like Kokoro

4 releases (2 breaking)

Uses new Rust 2024

0.3.0 Feb 7, 2026
0.2.1 Feb 7, 2026
0.1.1 Jan 13, 2026
0.1.0 Jan 13, 2026

#1182 in Text processing

Download history 32/week @ 2026-02-08 138/week @ 2026-02-15 19/week @ 2026-02-22 7/week @ 2026-03-15 37/week @ 2026-03-22 7/week @ 2026-03-29 28/week @ 2026-04-05

79 downloads per month
Used in 3 crates (2 directly)

MIT license

9MB
1.5K SLoC

Misaki-RS

misaki-rs is a self-contained, high-performance Rust port of the Misaki G2P (Grapheme-to-Phoneme) engine.

It is specifically designed for use with TTS models like Kokoro, providing accurate Part-of-Speech aware phonemization for English text.

Features

  • Self-Contained: All lexicons, dictionaries, and Part-of-Speech tagger weights are embedded directly into the binary at compile time. No external resource files are required at runtime.
  • POS-Aware Phonemization: Uses an averaged perceptron tagger to handle heteronyms (words with different pronunciations based on context, e.g., object as a noun vs. verb).
  • Multi-Dialect Support: Supports both US English (en-us) and British English (en-gb).
  • Morphological Stemming: Intelligent handling of suffixes (plurals, past tense, continuous tense). Other rules may be added in the future. Currently those are:
    • s plural stemming
    • ed past tense stemming
    • ing continuous tense stemming
  • Number Conversion: Automatically converts numeric values into their spoken word equivalents.
  • Optional espeak fallback (feature espeak, enabled by default): For out-of-vocabulary words, use espeak-ng to produce phonemes. Disable with default-features = false for a smaller build with no system espeak dependency; unknown words will then be spelled letter-by-letter or marked as unknown.

Why “spelling out” when espeak is disabled?

When a word is not in the lexicon and no rule applies, the engine needs a fallback. With the espeak feature enabled, that fallback is espeak-ng: the word is sent to espeak and its IPA output is converted to the engine’s phoneme set. With the espeak feature disabled, there is no external fallback, so the engine falls back to character-by-character spelling: each letter is phonemized as its name (e.g. “B” → “bˈi”, “K” → “kˈe‍ɪ”). So for example “eBook” becomes the sequence of letter names (E, B, O, O, K) instead of the word “e-book”. Single-character tokens and unrecognized characters may be marked as unknown (❓) instead. This behavior is intentional so that builds without espeak still produce some output rather than failing.

Testing espeak

To check that espeak fallback is working, phonemize an out-of-vocabulary word like "eBook" and assert it does not contain the unknown marker and is not spelled letter-by-letter:

cargo test test_ebook_with_espeak -- --nocapture

You should see output like eBook (with espeak): ˈi bˈʊk. (word-like). Without the espeak feature, the same word is spelled out: cargo test test_ebook_without_espeak --no-default-features -- --nocapture gives e.g. eBook (without espeak): ˈiː bˈi ˈo‍ʊ ˈo‍ʊ kˈe‍ɪ (E, B, O, O, K as letter names).

Installation

Add this to your Cargo.toml:

[dependencies]
misaki-rs = "0.3.0"

Optional: disable espeak fallback (smaller build, no espeak-ng dependency):

[dependencies]
misaki-rs = { version = "0.3.0", default-features = false }

To depend on misaki-rs without default features but still use espeak:

misaki-rs = { version = "0.3.0", default-features = false, features = ["espeak"] }

Quick Start

use misaki_rs::G2P;

fn main() {
    // Initialize for US English (false = US, true = GB)
    let g2p = G2P::new(false); 
    
    let (phonemes, tokens) = g2p.g2p("Hello, world! 123");
    println!("US Phonemes: {}", phonemes);
    
    // Initialize for British English
    let g2p_gb = G2P::new(true);
    let (phonemes_gb, _) = g2p_gb.g2p("The schedule is full.");
    println!("GB Phonemes: {}", phonemes_gb);
}

Pronunciations

The original misaki project had very few words and some were not pronunced correctly. Here I updated the original pronunciation dict to include more words and correct pronunciations using eSpeak.

Scope

This repository aims to provide a lightweight and efficient alternative to ONNX-based phonemizers for Rust applications. It eliminates the need for external C++ dependencies or large model files by porting the logic and data into native Rust.

License

This project is based on the original Misaki library. See the original repository for licensing details regarding the underlying dictionary data.

Dependencies

~5–8.5MB
~163K SLoC