#text-to-speech #open-j-talk #library

bin+lib jpreprocess

Japanese text preprocessor for Text-to-Speech application (OpenJTalk rewrite in rust language)

19 releases (11 breaking)

0.12.0 Feb 16, 2025
0.10.0 Aug 14, 2024
0.9.1 Apr 29, 2024
0.8.1 Mar 23, 2024
0.2.0 Jun 6, 2023

#775 in Text processing

Download history 80/week @ 2024-12-31 74/week @ 2025-01-07 138/week @ 2025-01-14 83/week @ 2025-01-21 9/week @ 2025-01-28 41/week @ 2025-02-04 344/week @ 2025-02-11 671/week @ 2025-02-18 112/week @ 2025-02-25 60/week @ 2025-03-04 18/week @ 2025-03-11 17/week @ 2025-03-18 218/week @ 2025-03-25 31/week @ 2025-04-01 23/week @ 2025-04-08 51/week @ 2025-04-15

324 downloads per month
Used in 2 crates

BSD-3-Clause

340KB
8K SLoC

jpreprocess

Japanese text preprocessor for Text-to-Speech application.

This project is a rewrite of OpenJTalk in Rust language.

Usage

Put the following in Cargo.toml

[dependencies]
jpreprocess = "0.12.0"

It may be necessary to add jpreprocess-njd and/or jpreprocess-jpcommon if you want control over how njd and jpcommon are processed.

Example

In this example, jpreprocess takes a lindera dictionary and preprocesses a text into jpcommon labels.

use jpreprocess::*;

let config = JPreprocessConfig {
     dictionary: SystemDictionaryConfig::File(path),
     user_dictionary: None,
 };
let jpreprocess = JPreprocess::from_config(config)?;

let jpcommon_label = jpreprocess
    .extract_fullcontext("日本語文を解析し、音声合成エンジンに渡せる形式に変換します.")?;
assert_eq!(
  jpcommon_label[2].to_string(),
  concat!(
      "sil^n-i+h=o",
      "/A:-3+1+7",
      "/B:xx-xx_xx",
      "/C:02_xx+xx",
      "/D:02+xx_xx",
      "/E:xx_xx!xx_xx-xx",
      "/F:7_4#0_xx@1_3|1_12",
      "/G:4_4%0_xx_1",
      "/H:xx_xx",
      "/I:3-12@1+2&1-8|1+41",
      "/J:5_29",
      "/K:2+8-41"
  )
);

Other examples can be found at GitHub.

Copyrights

This software includes source code from:

  • OpenJTalk. Copyright (c) 2008-2016 Nagoya Institute of Technology Department of Computer Science
  • Lindera. Copyright (c) 2019 by the project authors

License

BSD-3-Clause

API Reference

Dependencies

~20–36MB
~637K SLoC