#tokenizer #tantivy #api #token #stream #charge #tokenizer-api

summavy-tokenizer-api

Tokenizer API of summavy

1 unstable release

0.1.0 Jan 12, 2023

#7 in #charge


Used in summavy

MIT license

7KB
125 lines

#Tokenizer-API

An API to interface a tokenizer with tantivy.

The API will be kept stable in order to not break support for existing tokenizers.


lib.rs:

Tokenizer are in charge of chopping text into a stream of tokens ready for indexing. This is an seperate crate from tantivy, so implementors don't need to update for each new tantivy version.

To add support for a tokenizer, implement the Tokenizer trait. Checkout the tantivy repo for some examples.

Dependencies

~0.4–1MB
~23K SLoC