Cargo Features
[dependencies]
bm25 = { version = "2.2.0", default-features = false, features = ["default_tokenizer", "language_detection", "parallelism"] }
- default = default_tokenizer
-
The
default_tokenizer
feature is set by default wheneverbm25
is added without
somewhere in the dependency tree.default-features = false - default_tokenizer default language_detection?
-
The default tokenizer is a good choice for most use-cases. It normalizes unicode, splits unicode word boundaries, removes stop words, and stems the remaining words.
Enables cached, deunicode, rust-stemmers, stop-words, and unicode-segmentation
Affects
embedder::DefaultTokenizer
… - language_detection = default_tokenizer
-
With language detection enabled, you can configure the default tokenizer to detect the language of the input text.
Enables whichlang
- parallelism
-
With parallelism enabled, batch fitting jobs happen in parallel.
Enables rayon