15 releases (7 breaking)

0.8.0 Jul 11, 2021
0.7.0 Apr 6, 2021
0.6.0 Jan 25, 2021
0.5.0 Oct 7, 2020
0.1.3 Feb 7, 2020
Download history 47/week @ 2021-07-04 139/week @ 2021-07-11 75/week @ 2021-07-18 31/week @ 2021-07-25 44/week @ 2021-08-01 14/week @ 2021-08-08 41/week @ 2021-08-15 11/week @ 2021-08-22 9/week @ 2021-08-29 5/week @ 2021-09-05 28/week @ 2021-09-12 8/week @ 2021-09-19 6/week @ 2021-09-26 25/week @ 2021-10-03 25/week @ 2021-10-10 10/week @ 2021-10-17

214 downloads per month
Used in 5 crates (2 directly)

MIT/Apache

2MB
25K SLoC

C++ 24K SLoC // 0.1% comments Rust 638 SLoC // 0.0% comments Shell 105 SLoC // 0.2% comments Batch 32 SLoC

sentencepiece

This Rust crate is a binding for the sentencepiece unsupervised text tokenizer. The crate documentation is available online.

libsentencepiece dependency

This crate depends on the sentencepiece C++ library. By default, this dependency is treated as follows:

  • If sentencepiece could be found with pkg-config, the crate will link against the library found through pkg-config. Warning: dynamic linking only works correctly with sentencepiece 0.1.95 or later, due to a bug in earlier versions.
  • Otherwise, the crate's build script will do a static build of the sentencepiece library. This requires that cmake is available.

If you wish to override this behavior, the sentencepiece-sys crate offers two features:

  • system: always attempt to link to the sentencepiece library found with pkg-config.
  • static: always do a static build of the sentencepiece library and link against that.

Dependencies

`