#thai #nlp #library #text

chawuek

Attacut ported Thai word segmentation/breaking library

1 unstable release

0.1.0 Jul 16, 2021

#9 in #thai


Used in khatson

Apache-2.0

660KB
117 lines

Contains (Zip file, 710KB) data/attacut-c/model.pt

chawuek

Attacut Thai word tokenizer ported to Rust

Status

WIP

Dependencies

~7–11MB
~226K SLoC