1 unstable release
new 0.38.1 | Dec 6, 2024 |
---|
#2045 in Text processing
124 downloads per month
23KB
509 lines
Overview
lindera-sqlite is a C ABI library which exposes a FTS5 tokenizer function.
When used as a custom FTS5 tokenizer this enables application to support Chinese, Japanese and Korean in full-text search.
Build extension
% cargo build --features=ipadic,ko-dic,cc-cedict,compress,extension
Set enviromment variable for Lindera configuration
% export LINDERA_CONFIG_PATH=./resources/lindera.yml
Then start SQLite
% sqlite3 example.db
Load extension
sqlite> .load ./target/debug/liblindera_sqlite lindera_fts5_tokenizer_init
Create table using FTS5 with Lindera tokenizer
sqlite> CREATE VIRTUAL TABLE example USING fts5(content, tokenize='lindera_tokenizer');
Insert data
sqlite> INSERT INTO example(content) VALUES ("Linderaは形態素解析エンジンです。ユーザー辞書も利用可能です。");
Search data
sqlite> SELECT * FROM example WHERE content MATCH "Lindera" ORDER BY bm25(example) LIMIT 10;
Dependencies
~19–34MB
~605K SLoC