#openai #ai #gpt #bpe

build tiktoken-rs

Library for encoding and decoding with the tiktoken library in Rust

12 releases

new 0.3.3 Mar 23, 2023
0.3.2 Mar 18, 2023
0.2.2 Mar 15, 2023
0.1.4 Mar 3, 2023
0.1.2 Feb 2, 2023

#39 in Build Utils

Download history 35/week @ 2023-01-27 215/week @ 2023-02-03 146/week @ 2023-02-10 151/week @ 2023-02-17 116/week @ 2023-02-24 440/week @ 2023-03-03 315/week @ 2023-03-10 859/week @ 2023-03-17

1,743 downloads per month
Used in 4 crates

MIT license

824 lines


Github Contributors Github Stars CI

crates.io status crates.io downloads Rust dependency status

Rust library for tokenizing text with OpenAI models using tiktoken.

This library provides a set of ready-made tokenizer libraries for working with GPT, tiktoken and related OpenAI models. Use cases covers tokenizing and counting tokens in text inputs.

This library is built on top of the tiktoken library and includes some additional features and enhancements for ease of use with rust code.


For full working examples for all supported features, see the examples directory in the repository.


  1. Install this tool locally with cargo
cargo add tiktoken-rs

Then in your rust code, call the API

Counting token length

use tiktoken_rs::p50k_base;

let bpe = p50k_base().unwrap();
let tokens = bpe.encode_with_special_tokens(
  "This is a sentence   with spaces"
println!("Token count: {}", tokens.len());

Counting max_tokens for a chat completion request

use tiktoken_rs::get_chat_completion_max_tokens;
use async_openai::types::{ChatCompletionRequestMessageArgs, Role};

let messages = vec![
        .content("You are a helpful assistant!")
        .content("Hello, how are you?")
let max_tokens = get_chat_completion_max_tokens("gpt-4", &messages).unwrap();
println!("max_tokens: {}", max_tokens);

tiktoken supports these encodings used by OpenAI models:

Encoding name OpenAI models
cl100k_base ChatGPT models, text-embedding-ada-002
p50k_base Code models, text-davinci-002, text-davinci-003
p50k_edit Use for edit models like text-davinci-edit-001, code-davinci-edit-001
r50k_base (or gpt2) GPT-3 models like davinci

See the examples in the repo for use cases. For more context on the different tokenizers, see the OpenAI Cookbook

Encountered any bugs?

If you encounter any bugs or have any suggestions for improvements, please open an issue on the repository.


Thanks @spolu for the original code, and .tiktoken files.


This project is licensed under the MIT License.


~395K SLoC