4 releases (2 breaking)

0.3.0 Apr 12, 2024
0.2.0 Sep 1, 2023
0.1.1 Jun 23, 2023
0.1.0 Jun 9, 2023

#349 in Database implementations

Download history 75836/week @ 2024-09-09 76588/week @ 2024-09-16 80204/week @ 2024-09-23 78695/week @ 2024-09-30 75744/week @ 2024-10-07 73997/week @ 2024-10-14 57734/week @ 2024-10-21 52060/week @ 2024-10-28 62683/week @ 2024-11-04 61194/week @ 2024-11-11 63712/week @ 2024-11-18 43035/week @ 2024-11-25 63641/week @ 2024-12-02 50973/week @ 2024-12-09 53066/week @ 2024-12-16 19995/week @ 2024-12-23

189,296 downloads per month
Used in 60 crates (8 directly)

MIT license

7KB
117 lines

#Tokenizer-API

An API to interface a tokenizer with tantivy.

The API will be kept stable in order to not break support for existing tokenizers.


lib.rs:

Tokenizer are in charge of chopping text into a stream of tokens ready for indexing. This is an seperate crate from tantivy, so implementors don't need to update for each new tantivy version.

To add support for a tokenizer, implement the Tokenizer trait. Checkout the tantivy repo for some examples.

Dependencies

~0.3–0.9MB
~20K SLoC