#search #autocomplete #struct #vec #hashmap

indicium

Simple in-memory search for collections and key-value stores

7 releases

new 0.2.0 Sep 26, 2021
0.1.5 Sep 21, 2021

#21 in Database implementations

31 downloads per month

MIT/Apache

130KB
2K SLoC

Indicium

A simple in-memory search for collections (Vec, HashMap, BTreeMap, etc) and key-value stores. Features autocompletion.

There are many incredible search engines available for Rust. Many seem to require compiling a separate server binary. I wanted something simple, light weight, and that could conveniently search structs and collections. So I have made indicium.

Quick Start Guide

For our Quick Start Guide example, we will be searching inside of the following struct:

struct MyStruct {
    title: String,
    description: String,
}

1. Implementing Indexable

To begin, we must make our record indexable. We'll do this by implementing the Indexable trait for our struct. The idea is to return a String for every field that we would like to be indexed. Example:

use indicium::simple::Indexable;

impl Indexable for MyStruct {
    fn strings(&self) -> Vec<String> {
        vec![
            self.title.clone(),
            self.description.clone(),
        ]
    }
}

Don't forget that you may make numbers, numeric identifiers, enums, and other types indexable by converting them to a String and including them in the returned Vec<String>.

2. Indexing a Collection

To index an existing collection, we can iterate over the collection. For each record, we will insert it into the search index. This should look something like these two examples:

Vec

use indicium::simple::SearchIndex;

let my_vec: Vec<MyStruct> = Vec::new();

// In the case of a `Vec` collection, we use the index as our key.  A
// `Vec` index is a `usize` type. Therefore we will instantiate
// `SearchIndex` as `SearchIndex<usize>`.

let mut search_index: SearchIndex<usize> = SearchIndex::default();

my_vec
    .iter()
    .enumerate()
    .for_each(|(index, element)|
        search_index.insert(&index, element)
    );

HashMap

use std::collections::HashMap;
use indicium::simple::SearchIndex;

let my_hash_map: HashMap<String, MyStruct> = HashMap::new();

// In the case of a `HashMap` collection, we use the hash map's key as
// the `SearchIndex` key. In our hypothetical example, we will use
// MyStruct's `title` as a the key which is a `String` type. Therefore
// we will instantiate `HashMap<K, V>` as HashMap<String, MyStruct> and
// `SearchIndex<K>` as `SearchIndex<String>`.

let mut search_index: SearchIndex<String> = SearchIndex::default();

my_hash_map
    .iter()
    .for_each(|(key, value)|
        search_index.insert(key, value)
    );

As long as the Indexable trait was implemented for your value type, the above examples will index a previously populated Vec or HashMap. However, the preferred method for large collections is to insert into the SearchIndex as you insert into your collection (Vec, HashMap, etc.)

Once the index has been populated, you can use the autocomplete and search functions.

3. Searching

The search function will return keys as the search results. Each resulting key can then be used to retrieve the full record from its collection. Search keywords must be an exact match.

Search only supports exact keyword matches and does not use fuzzy matching. Consider providing the autocomplete feature to your users as an ergonomic alternative to fuzzy matching.

Example usage:

let mut search_index: SearchIndex<String> = SearchIndex::default();

let resulting_keys: Vec<usize> =
    search_index.search(&"helicopter".to_string());

assert_eq!(resulting_keys, Some(vec![&1]));

4. Autocompletion

The autocomplete function will provide several autocompletion options for the last partial keyword in the supplied string. The results are returned in lexographic order.

Example usage:

let autocomplete_options: Vec<String> =
    search_index.autocomplete(&"a very big bir".to_string());

assert_eq!(
	autocomplete_options,
	vec!["very big bird", "very big birthday"]
);

Limitations

The priority of this crate is to be light-weight and easy to use.

  • Unfortunately, multi-keyword searches on huge data-sets are not fast. More keywords means a slower the response time. There are certainly ways to speed this up but my current solution would require significantly more memory. This crate is intended to be light-weight for in-memory data-sets. My current view is that rectifying this arguably goes against the crate's goals.

Dependencies

~1.5MB
~31K SLoC