#distance #array #vector #similarity #multi-dimensional #metrics #points

fast-distances

A rust library to provide distances for multidimensional arrays

1 unstable release

new 0.0.1 Dec 13, 2024

#58 in #similarity

Download history 129/week @ 2024-12-09

129 downloads per month

Custom license

155KB
3K SLoC

fast-distances

Rust Similarity and Distance Metrics Library

This Rust package provides a wide range of functions for computing various distance and similarity metrics between vectors or points in a high-dimensional space. These metrics are widely used in fields such as machine learning, statistics, data science, and computational biology.

Modules

Each module in this package implements a specific distance or similarity measure, some with gradient computations for optimization tasks. Below is a list of available modules:

  • approx_log_gamma: Approximation of the logarithm of the Gamma function.
  • bray_curtis: Bray-Curtis dissimilarity, a measure for ecological distance.
  • bray_curtis_grad: Gradient of the Bray-Curtis dissimilarity.
  • canberra: Canberra distance, a city block-like metric with a normalization.
  • canberra_grad: Gradient of the Canberra distance.
  • chebyshev: Chebyshev distance (L∞ distance), the maximum distance along any coordinate axis.
  • chebyshev_grad: Gradient of the Chebyshev distance.
  • correlation: Pearson correlation coefficient, a measure of linear correlation between two vectors.
  • cosine: Cosine similarity, measuring the cosine of the angle between two vectors.
  • cosine_grad: Gradient of the cosine similarity.
  • dice: Dice coefficient, a similarity measure often used in bioinformatics.
  • euclidean: Euclidean distance, the straight-line distance between two points.
  • euclidean_grad: Gradient of the Euclidean distance.
  • hamming: Hamming distance, the number of differing positions between two strings of equal length.
  • haversine: Haversine distance, used to calculate the great-circle distance between two points on a sphere.
  • haversine_grad: Gradient of the Haversine distance.
  • hellinger: Hellinger distance, a measure for comparing probability distributions.
  • hellinger_grad: Gradient of the Hellinger distance.
  • hyperboloid_grad: Gradient of the hyperboloid distance, a metric on hyperbolic spaces.
  • jaccard: Jaccard similarity coefficient, a measure of the intersection between two sets divided by their union.
  • kulsinski: Kulsinski similarity coefficient, a distance measure for binary vectors.
  • ll_dirichlet: Log-Likelihood of the Dirichlet distribution, used for probabilistic comparison of Dirichlet-distributed data.
  • log_beta: Log of the Beta distribution, used in statistical modeling.
  • log_single_beta: Logarithmic computation of a single Beta distribution.
  • mahalanobis: Mahalanobis distance, a distance metric that accounts for correlations between variables.
  • mahalanobis_grad: Gradient of the Mahalanobis distance.
  • manhattan: Manhattan distance (L1 distance), the sum of the absolute differences between coordinates.
  • manhattan_grad: Gradient of the Manhattan distance.
  • matching: Matching distance, a similarity measure based on matching elements in two sets.
  • minkowski: Minkowski distance, a generalization of both Euclidean and Manhattan distances.
  • minkowski_grad: Gradient of the Minkowski distance.
  • poincare: Poincaré distance, used for hyperbolic spaces and geometries.
  • rogers_tanimoto: Rogers-Tanimoto similarity, a distance measure for binary data.
  • russellrao: Russell-Rao similarity, a measure for binary vectors.
  • sokal_michener: Sokal-Michener similarity, a metric for categorical data.
  • sokal_sneath: Sokal-Sneath similarity, another metric for categorical data.
  • standardised_euclidean: Standardized Euclidean distance, which normalizes the Euclidean distance by the variance.
  • standardised_euclidean_grad: Gradient of the standardized Euclidean distance.
  • weighted_minkowski: Weighted Minkowski distance, a variant of Minkowski with weightings for each dimension.
  • weighted_minkowski_grad: Gradient of the weighted Minkowski distance.
  • yule: Yule's coefficient, used to measure association between two binary vectors.

Installation

Add this package to your Cargo.toml to use it in your project:

[dependencies] fast-distances = "0.1" Usage

To use one of the available distance or similarity metrics, import the respective module in your Rust code:

use distances::{cosine, euclidean, manhattan};

fn main() {
    let vector1 = vec![1.0, 2.0, 3.0];
    let vector2 = vec![4.0, 5.0, 6.0];

    // Compute cosine similarity
    let cosine_sim = cosine(&vector1, &vector2);
    println!("Cosine Similarity: {}", cosine_sim);

    // Compute Euclidean distance
    let euclidean_dist = euclidean(&vector1, &vector2);
    println!("Euclidean Distance: {}", euclidean_dist);

    // Compute Manhattan distance
    let manhattan_dist = manhattan(&vector1, &vector2);
    println!("Manhattan Distance: {}", manhattan_dist);
}

Contributing

Contributions are welcome! If you'd like to contribute a new metric or improve an existing one, feel free to open an issue or a pull request.

  1. Fork the repository.
  2. Clone your fork locally.
  3. Make changes and run tests to ensure they pass.
  4. Submit a pull request with a clear description of your changes.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

This package draws from many well-established distance and similarity metrics commonly used in data analysis, machine learning, and information retrieval.

Dependencies

~1.5MB
~33K SLoC