5 releases (breaking)

0.10.0 May 25, 2024
0.9.0 May 16, 2024
0.7.0 Apr 12, 2024
0.4.0 Feb 8, 2024
0.1.0 Dec 31, 2023

#555 in Machine learning

Download history 10/week @ 2024-02-19 24/week @ 2024-02-26 26/week @ 2024-03-04 8/week @ 2024-03-11 13/week @ 2024-04-01 132/week @ 2024-04-08 4/week @ 2024-04-15 159/week @ 2024-05-13 160/week @ 2024-05-20 29/week @ 2024-05-27

348 downloads per month

MIT/Apache

59KB
1.5K SLoC

rten-text

Library containing text tokenization and related functionality, for preparing inputs and decoding outputs for text models (eg. BERT).

The functionality is a subset of that found in Hugging Face Tokenizers. It has less functionality, but also fewer dependencies, and none that require C/C++.


lib.rs:

This crate provides tools for pre and post-processing text inputs and outputs of models. This primarily means tokenizing and de-tokenizing text.

If you need a more featureful set of tokenizers, see the tokenizers project.

Dependencies

~1.7–2.6MB
~77K SLoC