Cargo Features
[dependencies]
kitoken = { version = "0.10.1", default-features = false, features = ["all", "std", "serialization", "normalization", "normalization-unicode", "normalization-charsmap", "split", "split-unicode-script", "convert", "convert-tokenizers", "convert-sentencepiece", "convert-tiktoken", "convert-tekken", "convert-detect", "regex-unicode", "regex-perf", "regex-onig", "multiversion", "unstable"] }
- default = convert, multiversion, normalization, regex-perf, serialization, std
-
These default features are set whenever
kitoken
is added without
somewhere in the dependency tree.default-features = false - all = convert, multiversion, normalization, regex-perf, regex-unicode, serialization, split, std
-
Enables all stable features
- std default all?
-
Enables standard library features
Enables thiserror ^1.0
optional dependencies for the std feature
and std of memchr, orx-priority-queue, and optional multiversion
multiversion:
optional dependencies for the multiversion feature
- serialization default all? convert-detect?
-
Enables serialization and deserialization
serde:
optional dependencies for the serialization feature
- normalization default all? = normalization-charsmap, normalization-unicode
-
Enables all input normalization features
- normalization-unicode normalization
-
Enables unicode input normalization support
Enables unicode-normalization
optional dependencies for the normalization-unicode feature
- normalization-charsmap normalization
-
Enables precompiled charsmap input normalization support
- split all? = split-unicode-script
-
Enables all input split features
- split-unicode-script split?
-
Enables input split by unicode scripts
Enables unicode-script
optional dependencies for the split-unicode-script feature
- convert default all? = convert-detect, convert-sentencepiece, convert-tekken, convert-tiktoken, convert-tokenizers
-
Enables detection and conversion for all supported tokenizer data formats
- convert-tokenizers convert
-
Enables conversion for the HuggingFace Tokenizers format
Enables serde of hashbrown and base64
optional dependencies for the convert-tiktoken and convert-tokenizers features
and serde and serde_json
serde_json:
optional dependencies for the convert-tokenizers and convert-tekken features
- convert-sentencepiece convert
-
Enables conversion for the SentencePiece format
Enables sentencepiece-model
optional dependencies for the convert-sentencepiece feature
- convert-tiktoken convert
-
Enables conversion for the OpenAI Tiktoken format
Enables base64
- convert-tekken convert
-
Enables conversion for the Mistral Tekken format
Enables base64, serde, and serde_json
- convert-detect convert = serialization
-
Enables detection of supported formats during deserialization (enables serialization feature)
- regex-unicode all?
-
Enables support for additional regex unicode patterns
Enables unicode of fancy-regex
- regex-perf default all?
-
Enables additional regex performance optimizations
Enables perf of fancy-regex
- regex-onig
-
Use the oniguruma regex engine instead of fancy-regex
Enables onig
optional dependencies for the regex-onig feature
- multiversion default all?
-
Enables the use of multiversion for generating multiple code paths with different CPU feature utilization
Enables multiversion
- unstable
-
Enables the use of unstable features