Cargo Features

[dependencies]
bm25 = { version = "2.2.0", default-features = false, features = ["default_tokenizer", "language_detection", "parallelism"] }
default = default_tokenizer

The default_tokenizer feature is set by default whenever bm25 is added without default-features = false somewhere in the dependency tree.

default_tokenizer default language_detection?

The default tokenizer is a good choice for most use-cases. It normalizes unicode, splits unicode word boundaries, removes stop words, and stems the remaining words.

Enables cached, deunicode, rust-stemmers, stop-words, and unicode-segmentation

Affects embedder::DefaultTokenizer

language_detection = default_tokenizer

With language detection enabled, you can configure the default tokenizer to detect the language of the input text.

Enables whichlang

parallelism

With parallelism enabled, batch fitting jobs happen in parallel.

Enables rayon