#sentence-piece #tokenize #machine-learning

no-std sentencepiece-model

SentencePiece model parser generated from the SentencePiece protobuf definition

5 releases

0.1.4 Oct 8, 2024
0.1.3 Jul 16, 2024
0.1.2 Jul 4, 2024
0.1.1 Jul 2, 2024
0.1.0 Nov 18, 2023

#1229 in Encoding

Download history 531/week @ 2025-05-24 1154/week @ 2025-05-31 889/week @ 2025-06-07 749/week @ 2025-06-14 888/week @ 2025-06-21 708/week @ 2025-06-28 610/week @ 2025-07-05 1128/week @ 2025-07-12 621/week @ 2025-07-19 408/week @ 2025-07-26 393/week @ 2025-08-02 317/week @ 2025-08-09 367/week @ 2025-08-16 374/week @ 2025-08-23 1195/week @ 2025-08-30 725/week @ 2025-09-06

2,724 downloads per month
Used in kitoken

BSD-2-Clause

8KB
78 lines

SentencePiece model parser generated from the SentencePiece protobuf definition.

See SentencePieceModel for the entry point for parsing and accessing sentencepiece models.

use sentencepiece_model::SentencePieceModel;

let model = SentencePieceModel::from_file("tests/t5-spiece.model")?;
assert_eq!(model.pieces.len(), 32000);
assert_eq!(model.trainer().unwrap().unk_id(), 2);

sentencepiece-model

Crates.io Docs.rs

SentencePiece model parser generated from the SentencePiece protobuf definition.

use sentencepiece_model::SentencePieceModel;

let model = SentencePieceModel::from_file("tests/t5-spiece.model")?;
assert_eq!(model.pieces.len(), 32000);
assert_eq!(model.trainer()?.unk_id(), 2);

Usage

[dependencies]
sentencepiece-model = "0.1"

sentencepiece-model uses prost-build and protox to generate Rust code from the SentencePiece protobuf definition at build time. protoc is not required.

Dependencies

~0.3–2.2MB
~38K SLoC