#search-index #search-engine #search #in-memory #cosine #similarity #terms

searchy

Search index (in-memory), that can be constructed and searched using a bag of words model, with cosine similarity scoring based on tf-idf. Supports multiple search terms, permissions, text suggestions (for spelling errors), and evaluating common arithmetic expressions directly to values. Tries to reduce memory footprint.

9 unstable releases (4 breaking)

0.5.0 Nov 11, 2024
0.4.0 May 3, 2024
0.3.0 Dec 10, 2023
0.2.2 Apr 28, 2023
0.1.1 May 29, 2022

#48 in Database implementations

Apache-2.0

555KB
12K SLoC

searchy: an embeddable in-memory search engine

Search index (in-memory), that can be constructed and searched using a bag of words model, with cosine similarity scoring based on tf-idf. Supports multiple search terms, permissions, text suggestions (for spelling errors), and evaluating common arithmetic expressions directly to values. Tries to reduce memory footprint.

Features:

  • expression evaluation (1+2 will result in 3);
  • small in memory representation using two allocations (after building the index);
  • easy delta updates;
  • spell suggestions based on search index;
  • summary sentence in search results (based on Luhn);
  • filtering of result based on group (role-based access control).

Minimal example

use searchy::*;
use expry::*;

// MemoryPool avoids small allocations.
pool!(scope);

// Add documents
let mut builder = SearchBuilder::new();
let doc_id = builder.add_document("foo bar text", "url", "name", "group1", b"extra", &mut scope);
// doc_id can be used to later remove the document from the search index

// Build search index
let index : SearchIndex = builder.into();
// Roles are access groups of the user. The search documents have access groups and are are filtered based on the access group of the current user. 
const MAX_DOCS_RETRIEVED : usize = 1024;
let results = index.search("query text", "group1,group2", MAX_DOCS_RETRIEVED);
eprintln!("RESULTS: {}x/{} in {}ms", results.docs, results.more, results.duration);
for ScoredDocumentInfo{doc_id: _, score, info} in results.entries {
  eprintln!("{} -> {}, {}, {}", score, info.name, info.url, info.summary);
}

// do some mutations to the search index
let mut builder = SearchBuilder::from(index);
builder.remove_document(doc_id);

Alternatives

Dependencies