20 releases (10 breaking)

new 0.11.3 Apr 21, 2025
0.11.2 Jul 22, 2023
0.11.1 Mar 19, 2023
0.10.0 Oct 11, 2022
0.1.3 Feb 7, 2020

#2345 in Parser implementations

Download history 766/week @ 2024-12-30 1351/week @ 2025-01-06 1461/week @ 2025-01-13 1529/week @ 2025-01-20 2690/week @ 2025-01-27 1586/week @ 2025-02-03 1316/week @ 2025-02-10 782/week @ 2025-02-17 1596/week @ 2025-02-24 1097/week @ 2025-03-03 2011/week @ 2025-03-10 2549/week @ 2025-03-17 1856/week @ 2025-03-24 1871/week @ 2025-03-31 3198/week @ 2025-04-07 1824/week @ 2025-04-14

9,113 downloads per month
Used in 13 crates (via sentencepiece)

Apache-2.0

2MB
25K SLoC

C++ 24K SLoC // 0.1% comments Bitbake 371 SLoC // 0.5% comments Rust 217 SLoC // 0.0% comments Shell 5 SLoC

sentencepiece

This Rust crate is a binding for the sentencepiece unsupervised text tokenizer. The crate documentation is available online.

libsentencepiece dependency

This crate depends on the sentencepiece C++ library. By default, this dependency is treated as follows:

  • If sentencepiece could be found with pkg-config, the crate will link against the library found through pkg-config. Warning: dynamic linking only works correctly with sentencepiece 0.1.95 or later, due to a bug in earlier versions.
  • Otherwise, the crate's build script will do a static build of the sentencepiece library. This requires that cmake is available.

If you wish to override this behavior, the sentencepiece-sys crate offers two features:

  • system: always attempt to link to the sentencepiece library found with pkg-config.
  • static: always do a static build of the sentencepiece library and link against that.

No runtime deps