3 releases
new 0.1.2 | Jan 19, 2025 |
---|---|
0.1.1 | Jan 17, 2025 |
0.1.0 | Jan 10, 2025 |
#1052 in Text processing
211 downloads per month
59KB
1K
SLoC
segtok
A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic features. Ported from python package (unmaintained), fixes contractions bug.
Dependencies
~2.9–4MB
~75K SLoC