42 releases (4 breaking)
0.4.10 | Feb 13, 2024 |
---|---|
0.4.7 | Jan 31, 2024 |
0.4.2 | Dec 30, 2023 |
0.3.8 | Nov 19, 2023 |
0.1.19 | Jul 27, 2023 |
#50 in Database implementations
169 downloads per month
Used in heapswap
150KB
3K
SLoC
LanceDB Rust
LanceDB Rust SDK, a serverless vector database.
Read more at: https://lancedb.com/
lib.rs
:
VectorDB (LanceDB) -- Developer-friendly, serverless vector database for AI applications
LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrevial, filtering and management of embeddings.
The key features of LanceDB include:
- Production-scale vector search with no servers to manage.
- Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
- Support for vector similarity search, full-text search and SQL.
- Native Rust, Python, Javascript/Typescript support.
- Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
- GPU support in building vector indices[^note].
- Ecosystem integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.
[^note]: Only in Python SDK.
Getting Started
LanceDB runs in process, to use it in your Rust project, put the following in your Cargo.toml
:
cargo install vectordb
Quick Start
Connect to a database.
use vectordb::connect;
let db = connect("data/sample-lancedb").await.unwrap();
LanceDB accepts the different form of database path:
/path/to/database
- local database on file system.s3://bucket/path/to/database
orgs://bucket/path/to/database
- database on cloud object storedb://dbname
- Lance Cloud
You can also use ConnectOptions
to configure the connectoin to the database.
use vectordb::{connect_with_options, ConnectOptions};
let options = ConnectOptions::new("data/sample-lancedb")
.index_cache_size(1024);
let db = connect_with_options(&options).await.unwrap();
LanceDB uses arrow-rs to define schema, data types and array itself.
It treats FixedSizeList<Float16/Float32>
columns as vector columns.
For more details, please refer to LanceDB documentation.
Create a table
To create a Table, you need to provide a arrow_schema::Schema
and a arrow_array::RecordBatch
stream.
use arrow_schema::{DataType, Schema, Field};
use arrow_array::{RecordBatch, RecordBatchIterator};
let schema = Arc::new(Schema::new(vec![
Field::new("id", DataType::Int32, false),
Field::new("vector", DataType::FixedSizeList(
Arc::new(Field::new("item", DataType::Float32, true)), 128), true),
]));
// Create a RecordBatch stream.
let batches = RecordBatchIterator::new(vec![
RecordBatch::try_new(schema.clone(),
vec![
Arc::new(Int32Array::from_iter_values(0..1000)),
Arc::new(FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>(
(0..1000).map(|_| Some(vec![Some(1.0); 128])), 128)),
]).unwrap()
].into_iter().map(Ok),
schema.clone());
db.create_table("my_table", Box::new(batches), None).await.unwrap();
Create vector index (IVF_PQ)
tbl.create_index(&["vector"])
.ivf_pq()
.num_partitions(256)
.build()
.await
.unwrap();
Open table and run search
let results = table
.search(&[1.0; 128])
.execute_stream()
.await
.unwrap()
.try_collect::<Vec<_>>()
.await
.unwrap();
Dependencies
~85MB
~1.5M SLoC