4 releases (breaking)

new 0.9.0 May 16, 2024
0.7.0 Apr 12, 2024
0.4.0 Feb 8, 2024
0.1.0 Dec 31, 2023

#561 in Machine learning

Download history 3/week @ 2024-02-08 6/week @ 2024-02-15 17/week @ 2024-02-22 16/week @ 2024-02-29 24/week @ 2024-03-07 5/week @ 2024-03-14 10/week @ 2024-03-28 3/week @ 2024-04-04 136/week @ 2024-04-11

136 downloads per month

MIT/Apache

59KB
1.5K SLoC

rten-text

Library containing text tokenization and related functionality, for preparing inputs and decoding outputs for text models (eg. BERT).

The functionality is a subset of that found in Hugging Face Tokenizers. It has less functionality, but also fewer dependencies, and none that require C/C++.


lib.rs:

This crate provides tools for pre and post-processing text inputs and outputs of models. This primarily means tokenizing and de-tokenizing text.

If you need a more featureful set of tokenizers, see the tokenizers project.

Dependencies

~1.7–2.6MB
~76K SLoC