#vector #vector-database #embedding #metrics #neighbor #metadata #hnsw

bin+lib vectus

A vector database implemented in Rust for learning purposes

1 unstable release

new 0.1.1 Oct 21, 2024
0.1.0 Oct 21, 2024

#5 in #hnsw

MIT license

26KB
525 lines

Vectus (Alpha)

Vectus is a high-performance, graph-based vector database built from scratch in Rust. It provides efficient similarity search capabilities for large-scale vector data, making it ideal for applications like recommendation systems, image retrieval, natural language processing, and more.

Vectus uses a custom implementation of the Hierarchical Navigable Small World (HNSW) algorithm, allowing for scalable and fast nearest neighbor search across high-dimensional spaces. It includes advanced indexing techniques, such as graph-based structures and key-value store functionalities, all built with Rust for optimal performance.

Features

  • Graph-based Indexing: Efficient similarity search using the HNSW algorithm.
  • Customizable Storage: Store vector embeddings and metadata in a compact, efficient format.
  • High-speed Queries: Designed for real-time applications with prioritized speed over memory usage.
  • Scalable Insertion: Supports insertion of millions of vectors with logarithmic scaling on search times.
  • Key-Value Store Integration: Similar to sled, with custom support for vector embeddings.
  • Normalization and Custom Metric Support: Allows for flexible distance measures (e.g., cosine similarity).

Key Components

HNSW (Hierarchical Navigable Small World)

Vectus employs the HNSW algorithm for fast and scalable nearest-neighbor searches in high-dimensional vector spaces. The implementation prioritizes speed, with graph-based indexing for efficient search operations.

Vector Storage

Vectors are stored efficiently and can be written to disk in an organized, retrievable manner. This supports both in-memory and persistent storage solutions, making Vectus adaptable for various use cases.

Customizable Distance Metrics

Vectus provides flexibility in defining custom distance metrics or normalization factors, allowing users to optimize search results based on specific needs.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Dependencies

~8–19MB
~273K SLoC