3 releases (breaking)
new 0.7.0 | Apr 12, 2024 |
---|---|
0.4.0 | Feb 8, 2024 |
0.1.0 | Dec 31, 2023 |
#551 in Machine learning
59KB
1.5K
SLoC
rten-text
Library containing text tokenization and related functionality, for preparing inputs and decoding outputs for text models (eg. BERT).
The functionality is a subset of that found in Hugging Face Tokenizers. It has less functionality, but also fewer dependencies, and none that require C/C++.
lib.rs
:
This crate provides tools for pre and post-processing text inputs and outputs of models. This primarily means tokenizing and de-tokenizing text.
If you need a more featureful set of tokenizers, see the tokenizers project.
Dependencies
~1.7–2.6MB
~76K SLoC