1 unstable release
0.2.0 | Dec 30, 2024 |
---|
#8 in #korean
128 downloads per month
Used in 2 crates
(via ragit)
100KB
2K
SLoC
ragit-korean
Ragit-korean is a very simple korean tokenizer.
Ragit used to use charabia to tokenize cjk documents, but it has too many issues.
- Charabia bundles cjk dictionaries in the binary, which makes the file 70MiB bigger.
- It silently converts 완성형 korean to 조합형 korean. That silently messes up tfidf searches.
Dependencies
~10KB