22 releases (12 breaking)

0.13.2 Oct 1, 2020
0.13.0 Aug 19, 2020
0.12.0 Feb 19, 2020
0.11.3 Dec 20, 2019
0.1.1 Aug 14, 2016

#2 in Database implementations

Download history 894/week @ 2020-08-13 741/week @ 2020-08-20 880/week @ 2020-08-27 916/week @ 2020-09-03 718/week @ 2020-09-10 749/week @ 2020-09-17 743/week @ 2020-09-24 1056/week @ 2020-10-01 1221/week @ 2020-10-08 942/week @ 2020-10-15 892/week @ 2020-10-22 909/week @ 2020-10-29 1168/week @ 2020-11-05 685/week @ 2020-11-12 856/week @ 2020-11-19 549/week @ 2020-11-26

3,635 downloads per month
Used in 18 crates (14 directly)

MIT license

1.5MB
37K SLoC

Build Status codecov Join the chat at https://gitter.im/tantivy-search/tantivy License: MIT Build status Crates.io Say Thanks!

Tantivy

Become a patron

Tantivy is a full text search engine library written in Rust.

It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is not an off-the-shelf search engine server, but rather a crate that can be used to build such a search engine.

Tantivy is, in fact, strongly inspired by Lucene's design.

Benchmark

The following benchmark break downs performance for different type of queries / collection.

In general, Tantivy tends to be

  • slower than Lucene on union with a Top-K due to Block-WAND optimization.
  • faster than Lucene on intersection and phrase queries.

Your mileage WILL vary depending on the nature of queries and their load.

Features

  • Full-text search
  • Configurable tokenizer (stemming available for 17 Latin languages with third party support for Chinese (tantivy-jieba and cang-jie), Japanese (lindera and tantivy-tokenizer-tiny-segmente) and Korean (lindera + lindera-ko-dic-builder)
  • Fast (check out the 🐎 ✨ benchmark ✨ 🐎)
  • Tiny startup time (<10ms), perfect for command line tools
  • BM25 scoring (the same as Lucene)
  • Natural query language (e.g. (michael AND jackson) OR "king of pop")
  • Phrase queries search (e.g. "michael jackson")
  • Incremental indexing
  • Multithreaded indexing (indexing English Wikipedia takes < 3 minutes on my desktop)
  • Mmap directory
  • SIMD integer compression when the platform/CPU includes the SSE2 instruction set
  • Single valued and multivalued u64, i64, and f64 fast fields (equivalent of doc values in Lucene)
  • &[u8] fast fields
  • Text, i64, u64, f64, dates, and hierarchical facet fields
  • LZ4 compressed document store
  • Range queries
  • Faceted search
  • Configurable indexing (optional term frequency and position indexing)
  • Cheesy logo with a horse

Non-features

  • Distributed search is out of the scope of Tantivy. That being said, Tantivy is a library upon which one could build a distributed search. Serializable/mergeable collector state for instance, are within the scope of Tantivy.

Getting started

Tantivy works on stable Rust (>= 1.27) and supports Linux, MacOS, and Windows.

How can I support this project?

There are many ways to support this project.

  • Use Tantivy and tell us about your experience on Gitter or by email (paul.masurel@gmail.com)
  • Report bugs
  • Write a blog post
  • Help with documentation by asking questions or submitting PRs
  • Contribute code (you can join our Gitter)
  • Talk about Tantivy around you
  • Drop a word on on Say Thanks! or even Become a patron

Contributing code

We use the GitHub Pull Request workflow: reference a GitHub ticket and/or include a comprehensive commit message when opening a PR.

Clone and build locally

Tantivy compiles on stable Rust but requires Rust >= 1.27. To check out and run tests, you can simply run:

    git clone https://github.com/tantivy-search/tantivy.git
    cd tantivy
    cargo build

Run tests

Some tests will not run with just cargo test because of fail-rs. To run the tests exhaustively, run ./run-tests.sh.

Debug

You might find it useful to step through the programme with a debugger.

A failing test

Make sure you haven't run cargo clean after the most recent cargo test or cargo build to guarantee that the target/ directory exists. Use this bash script to find the name of the most recent debug build of Tantivy and run it under rust-gdb:

find target/debug/ -maxdepth 1 -executable -type f -name "tantivy*" -printf '%TY-%Tm-%Td %TT %p\n' | sort -r | cut -d " " -f 3 | xargs -I RECENT_DBG_TANTIVY rust-gdb RECENT_DBG_TANTIVY

Now that you are in rust-gdb, you can set breakpoints on lines and methods that match your source code and run the debug executable with flags that you normally pass to cargo test like this:

$gdb run --test-threads 1 --test $NAME_OF_TEST

An example

By default, rustc compiles everything in the examples/ directory in debug mode. This makes it easy for you to make examples to reproduce bugs:

rust-gdb target/debug/examples/$EXAMPLE_NAME
$ gdb run

Dependencies

~13MB
~213K SLoC