#sys #tokenizer #binding

sys sentencepiece-sys

Binding for the sentencepiece tokenizer

13 releases (6 breaking)

0.7.1 Jun 17, 2021
0.6.0 Jan 25, 2021
0.5.0 Oct 7, 2020
0.3.2 Jul 15, 2020
0.1.3 Feb 7, 2020
Download history 49/week @ 2021-07-04 123/week @ 2021-07-11 72/week @ 2021-07-18 31/week @ 2021-07-25 40/week @ 2021-08-01 16/week @ 2021-08-08 40/week @ 2021-08-15 12/week @ 2021-08-22 11/week @ 2021-08-29 6/week @ 2021-09-05 28/week @ 2021-09-12 8/week @ 2021-09-19 6/week @ 2021-09-26 24/week @ 2021-10-03 21/week @ 2021-10-10 13/week @ 2021-10-17

240 downloads per month
Used in 6 crates (via sentencepiece)

Apache-2.0

2MB
24K SLoC

C++ 24K SLoC // 0.1% comments Rust 145 SLoC // 0.0% comments Shell 106 SLoC // 0.2% comments Batch 33 SLoC

sentencepiece

This Rust crate is a binding for the sentencepiece unsupervised text tokenizer. The crate documentation is available online.

libsentencepiece dependency

This crate depends on the sentencepiece C++ library. By default, this dependency is treated as follows:

  • If sentencepiece could be found with pkg-config, the crate will link against the library found through pkg-config. Warning: dynamic linking only works correctly with sentencepiece 0.1.95 or later, due to a bug in earlier versions.
  • Otherwise, the crate's build script will do a static build of the sentencepiece library. This requires that cmake is available.

If you wish to override this behavior, the sentencepiece-sys crate offers two features:

  • system: always attempt to link to the sentencepiece library found with pkg-config.
  • static: always do a static build of the sentencepiece library and link against that.

No runtime deps

,`