4 releases (2 breaking)

0.3.0 Apr 12, 2024
0.2.0 Sep 1, 2023
0.1.1 Jun 23, 2023
0.1.0 Jun 9, 2023

#141 in Database implementations

Download history 52034/week @ 2024-05-13 53176/week @ 2024-05-20 58005/week @ 2024-05-27 121533/week @ 2024-06-03 110483/week @ 2024-06-10 120730/week @ 2024-06-17 154128/week @ 2024-06-24 126581/week @ 2024-07-01 114723/week @ 2024-07-08 117996/week @ 2024-07-15 123452/week @ 2024-07-22 124702/week @ 2024-07-29 101468/week @ 2024-08-05 85821/week @ 2024-08-12 88985/week @ 2024-08-19 84236/week @ 2024-08-26

363,940 downloads per month
Used in 34 crates (8 directly)

MIT license

7KB
117 lines

#Tokenizer-API

An API to interface a tokenizer with tantivy.

The API will be kept stable in order to not break support for existing tokenizers.


lib.rs:

Tokenizer are in charge of chopping text into a stream of tokens ready for indexing. This is an seperate crate from tantivy, so implementors don't need to update for each new tantivy version.

To add support for a tokenizer, implement the Tokenizer trait. Checkout the tantivy repo for some examples.

Dependencies

~0.4–1MB
~22K SLoC