|0.5.7||Mar 16, 2021|
|0.5.2||Nov 1, 2020|
|0.3.1||Jul 17, 2020|
|0.2.2||Feb 26, 2020|
#46 in Machine learning
217 downloads per month
A tool to split text using a neural network. The main application is sentence boundary detection, but e. g. compound splitting for German is also supported.
- Robust: Not reliant on proper punctuation, spelling and case. See the metrics.
- Small: NNSplit uses a byte-level LSTM, so weights are small (< 4MB) and models can be trained for every unicode encodable language.
- Fast: Up to 2x faster than Spacy sentencization, see the benchmark.
- Multilingual: NNSplit currently has models for 7 different languages (German, English, French, Norwegian, Swedish, Simplified Chinese, Turkish). Try them in the demo.
Documentation has moved to the NNSplit website: https://bminixhofer.github.io/nnsplit.
NNSplit is licensed under the MIT license.