19 releases (5 breaking)

new 0.6.0 Dec 13, 2024
0.5.4 Nov 22, 2024
0.3.1 Jul 30, 2024

#358 in Database interfaces

Download history 128/week @ 2024-09-06 34/week @ 2024-09-13 153/week @ 2024-09-20 205/week @ 2024-09-27 237/week @ 2024-10-04 20/week @ 2024-10-11 127/week @ 2024-10-18 25/week @ 2024-10-25 2/week @ 2024-11-01 149/week @ 2024-11-22 16/week @ 2024-11-29 185/week @ 2024-12-06

350 downloads per month

Apache-2.0

140KB
1K SLoC

.github/workflows/rust.yml test Crates.io Version Crates.io Downloads (latest version) docs.rs GitHub commit activity Matrix

alt text

valentinus

next generation vector db built with lmdb bindings

dependencies

  • bincode/serde - serialize/deserialize
  • lmdb-rs - database bindings
  • ndarray - numpy equivalent
  • ort/onnx - embeddings

getting started

git clone https://github.com/kn0sys/valentinus && cd valentinus

optional environment variables

var usage default
LMDB_USER working directory of the user for database $USER
LMDB_MAP_SIZE Sets max environment size, i.e. size in memory/disk of all data 20% of available memory
ONNX_PARALLEL_THREADS parallel execution mode for this session 1
VALENTINUS_CUSTOM_DIM embeddings dimensions for custom models all-mini-lm-6 -> 384
VALENTINUS_LMDB_ENV environment for the database (i.e. test, prod) test

tests

  • Note: all tests currently require the all-MiniLM-L6-v2_onnx directory
  • Get the model.onnx and tokenizer.json from huggingface or build them
mkdir all-MiniLM-L6-v2_onnx
cd all-MiniLM-L6-v2_onnx && wget https://huggingface.co/nigel-christian/all-MiniLM-L6-v2_onnx/resolve/main/config.json
wget https://huggingface.co/nigel-christian/all-MiniLM-L6-v2_onnx/resolve/main/model.onnx
wget https://huggingface.co/nigel-christian/all-MiniLM-L6-v2_onnx/resolve/main/special_tokens_map.json
wget https://huggingface.co/nigel-christian/all-MiniLM-L6-v2_onnx/resolve/main/tokenizer_config.json
wget https://huggingface.co/nigel-christian/all-MiniLM-L6-v2_onnx/resolve/main/tokenizer.json
wget https://huggingface.co/nigel-christian/all-MiniLM-L6-v2_onnx/resolve/main/vocab.txt

RUST_TEST_THREADS=1 cargo test

examples

see examples

reference

inspired by this chromadb python tutorial

Dependencies

~23MB
~475K SLoC