#bert #tokenizer #port #google #original #word-piece

bert_tokenizer

This crate is a Rust port of Google's BERT WordPiece tokenizer

1 unstable release

0.1.3 Feb 15, 2023
0.1.2 Feb 14, 2023
0.1.1 Feb 14, 2023
0.1.0 Feb 14, 2023

#15 in #bert

25 downloads per month
Used in bert_create_pretraining

Custom license

225KB
373 lines

bert_tokenizer

[ API doc | crates.io ]

The crate provides the port of the original BERT tokenizer from the Google BERT repository.

License

MIT license. See LICENSE file for full license.


lib.rs:

This crate is a Rust port of Google's BERT GoogleBERT WordPiece tokenizer.

Dependencies

~2MB
~60K SLoC