7 unstable releases (3 breaking)
0.4.0 | May 9, 2020 |
---|---|
0.3.0 | May 2, 2020 |
0.2.3 | Apr 28, 2020 |
0.1.0 | Mar 13, 2020 |
#1727 in Algorithms
95KB
2K
SLoC
lsh-rs (Locality Sensitive Hashing)
Locality sensitive hashing can help retrieving Approximate Nearest Neighbors in sub-linear time.
For more information on the subject see:
- Introduction on LSH
- Section 2. describes the hash families used in this crate
- LSH and neural networks
Implementations
- Base LSH
- Signed Random Projections (Cosine similarity)
- L2 distance
- MIPS (Dot products/ Maximum Inner Product Search)
- MinHash (Jaccard Similarity)
- Multi Probe LSH
- Step wise probing
- SRP (only bit shifts)
- Query directed probing
- L2
- MIPS
- Step wise probing
- Generic numeric types
Getting started
use lsh_rs::LshMem;
// 2 rows w/ dimension 3.
let p = &[vec![1., 1.5, 2.],
vec![2., 1.1, -0.3]];
// Do one time expensive preprocessing.
let n_projections = 9;
let n_hash_tables = 30;
let dim = 10;
let dim = 3;
let mut lsh = LshMem::new(n_projections, n_hash_tables, dim)\
.srp()
.unwrap();
lsh.store_vecs(p);
// Query in sublinear time.
let query = &[1.1, 1.2, 1.2];
lsh.query_bucket(query);
Documentation
- Read the Rust docs.
- Read the Python docs for the Python bindings.
Python
At the moment, the Python bindings are only compiled for Linux x86_64 systems.
$ pip install floky
from floky import SRP
import numpy as np
N = 10000
n = 100
dim = 10
# Generate some random data points
data_points = np.random.randn(N, dim)
# Do a one time (expensive) fit.
lsh = SRP(n_projections=19, n_hash_tables=10)
lsh.fit(data_points)
# Query approximated nearest neigbors in sub-linear time
query = np.random.randn(n, dim)
results = lsh.predict(query)
Dependencies
~6–10MB
~197K SLoC