#embedding #parallel-processing #metrics #similarity #distance #levenshtein #compute

semanticsimilarity_rs

A library designed to compute similarity/distance metrics between embeddings

2 releases

new 0.1.1 May 5, 2024
0.1.0 Feb 24, 2024

#501 in Concurrency

MIT license

5KB
58 lines

Rusty Semantic Similarity

Crates.io Version Crates.io Total Downloads Crates.io License

A small library designed to compute similarity/dissimilarity metrics between embeddings using vector distance.

Current distance measures implemented:

  • Cosine (handles both normalized and non-normalized vectors)
  • Euclidean
  • Manhattan
  • Chebyshev
  • Angular
  • Jaccard Index
  • Levenshtein
  • Minkowski
  • Dot product

Features

  • Parallel Computation: Utilizes rayon for parallel processing.
  • Bring your own embedding: Use any embedding model to generate embeddings and compute the similarity/distance scores.

Installation

Add semanticsimilarity_rs to your Cargo.toml file

[dependencies]
semanticsimilarity_rs = "0.1.0" 

Or use cargo add

cargo add semanticsimilarity_rs

Usage

use semanticsimilarity_rs::{cosine_similarity, euclidean_distance};

fn main() {

    let vec1: [f64; 3] = [1.0, 2.0, 3.0];
    let vec2: [f64; 3] = [4.0, 5.0, 6.0];

    let similarity = cosine_similarity(&vec1, &vec2, false);

    println!("Cosine similarity between vec1 and vec2: {}", similarity);
}

Dependencies

~1.5MB
~28K SLoC