#line-break #machine-learning #model #organizer #sentence #port #powered

budoux

Rust port of BudouX (machine learning powered line break organizer tool)

6 releases

0.1.1 May 15, 2022
0.1.0 May 6, 2022
0.0.4 Jan 16, 2022

#321 in Machine learning

Apache-2.0

105KB
2K SLoC

BudouX-rs

Crates.io API reference Test License

BudouX-rs is a rust port of BudouX (machine learning powered line break organizer tool).

Note: This project contains the deliverables of the BudouX project.

Note: BudouX-rs supported plain text only, not supports html inputs.

Demo

https://sg0hsmt.github.io/budoux-rs/

Documentation

https://docs.rs/crate/budoux/

Usage

Split sentences with internal model.

let model = budoux::models::default_japanese_model();
let words = budoux::parse(model, "これはテストです。");

assert_eq!(words, vec!["これは", "テストです。"])

Load model from json file and split sentences using the loaded model.

let file = File::open(path_to_json).unwrap();
let reader = BufReader::new(file);
let model: budoux::Model = serde_json::from_reader(reader).unwrap();
let words = budoux::parse(&model, "これはテストです。");

assert_eq!(words, vec!["これは", "テストです。"])

Test

cargo test

You can use GitHub Actions locally by act.

act -j test

Generate model from original BudouX

go generate ./...

Note: Generate model is require Go 1.13 or later.

Dependencies

~48KB