#thai #nlp #library #text

chawuek

Attacut ported Thai word segmentation/breaking library

1 unstable release

0.1.0 Jul 16, 2021

#11 in #thai


Used in khatson

Apache-2.0

660KB
117 lines

Contains (Zip file, 710KB) data/attacut-c/model.pt

chawuek

Attacut Thai word tokenizer ported to Rust

Status

WIP

Dependencies

~8–11MB
~232K SLoC