Cargo Features

[dependencies]
kitoken = { version = "0.10.1", default-features = false, features = ["all", "std", "serialization", "normalization", "normalization-unicode", "normalization-charsmap", "split", "split-unicode-script", "convert", "convert-tokenizers", "convert-sentencepiece", "convert-tiktoken", "convert-tekken", "convert-detect", "regex-unicode", "regex-perf", "regex-onig", "multiversion", "unstable"] }
default = convert, multiversion, normalization, regex-perf, serialization, std

These default features are set whenever kitoken is added without default-features = false somewhere in the dependency tree.

all = convert, multiversion, normalization, regex-perf, regex-unicode, serialization, split, std

Enables all stable features

std default all?

Enables standard library features

Enables thiserror ^1.0

optional dependencies for the std feature

and std of memchr, orx-priority-queue, and optional multiversion

multiversion:

optional dependencies for the multiversion feature

serialization default all? convert-detect?

Enables serialization and deserialization

Enables postcard and serde

serde:

optional dependencies for the serialization feature

normalization default all? = normalization-charsmap, normalization-unicode

Enables all input normalization features

normalization-unicode normalization

Enables unicode input normalization support

Enables unicode-normalization

optional dependencies for the normalization-unicode feature

normalization-charsmap normalization

Enables precompiled charsmap input normalization support

Enables unicode of bstr

split all? = split-unicode-script

Enables all input split features

split-unicode-script split?

Enables input split by unicode scripts

Enables unicode-script

optional dependencies for the split-unicode-script feature

convert default all? = convert-detect, convert-sentencepiece, convert-tekken, convert-tiktoken, convert-tokenizers

Enables detection and conversion for all supported tokenizer data formats

convert-tokenizers convert

Enables conversion for the HuggingFace Tokenizers format

Enables serde of hashbrown and base64

optional dependencies for the convert-tiktoken and convert-tokenizers features

and serde and serde_json

serde_json:

optional dependencies for the convert-tokenizers and convert-tekken features

convert-sentencepiece convert

Enables conversion for the SentencePiece format

Enables sentencepiece-model

optional dependencies for the convert-sentencepiece feature

convert-tiktoken convert

Enables conversion for the OpenAI Tiktoken format

Enables base64

convert-tekken convert

Enables conversion for the Mistral Tekken format

Enables base64, serde, and serde_json

convert-detect convert = serialization

Enables detection of supported formats during deserialization (enables serialization feature)

regex-unicode all?

Enables support for additional regex unicode patterns

Enables unicode of fancy-regex

regex-perf default all?

Enables additional regex performance optimizations

Enables perf of fancy-regex

regex-onig

Use the oniguruma regex engine instead of fancy-regex

Enables onig

optional dependencies for the regex-onig feature

multiversion default all?

Enables the use of multiversion for generating multiple code paths with different CPU feature utilization

Enables multiversion

unstable

Enables the use of unstable features