#openai #models #bpe #python #original #tokeniser #tiktoken

tiktoken-rust

a fast BPE tokeniser for use with OpenAI's models

2 releases

0.2.1 Jul 31, 2023
0.2.0 May 4, 2023
0.1.0 Apr 28, 2023

#628 in Machine learning

Download history 9/week @ 2024-02-22 5/week @ 2024-02-29 11/week @ 2024-03-07 2/week @ 2024-03-14 27/week @ 2024-03-28 17/week @ 2024-04-04 37/week @ 2024-04-11

81 downloads per month

MIT license

57KB
1K SLoC

tiktoken-rust

STATUS: Under development.

tiktoken is a fast BPE tokeniser for use with OpenAI's models. It provides Python interface to interact with it.

This project is a fork of original repo, bring the capability to rust world.

use tiktoken_rust as tt;

let enc = tt::get_encoding("cl100k_base").unwrap();

assert_eq!(
    "hello world",
    enc.decode(&enc.encode_ordinary("hello world"), tt::DecodeMode::Strict).unwrap()
)

lib.rs:

tiktoken_rust

This crate is a tokeniser for use with OpenAI's models.

Dependencies

~9–25MB
~374K SLoC