#search #lr #data #cluster #clam #euclidean #dt #manhattan


Clustered Learning of Approximate Manifolds

25 releases (11 breaking)

0.21.6 Sep 17, 2023
0.20.3 Jul 30, 2023

#42 in Value formatting

Download history 71/week @ 2023-06-14 58/week @ 2023-06-21 28/week @ 2023-06-28 105/week @ 2023-07-05 70/week @ 2023-07-12 7/week @ 2023-07-19 67/week @ 2023-07-26 38/week @ 2023-08-02 67/week @ 2023-08-09 77/week @ 2023-08-16 77/week @ 2023-08-23 3/week @ 2023-08-30 32/week @ 2023-09-06 30/week @ 2023-09-13 28/week @ 2023-09-20

94 downloads per month

MIT license

3.5K SLoC

CLAM: Clustered Learning of Approximate Manifolds (v0.21.6)

CLAM is a Rust/Python library for learning approximate manifolds from data. It is designed to be fast, memory-efficient, easy to use, and scalable for big data applications.

CLAM provides utilities for fast search (Cakes) and anomaly detection (Chaoda).

As of writing this document, the project is still in a pre-1.0 state. This means that the API is not yet stable and breaking changes may occur frequently.


CLAM is a library crate so you can add it to your crate using cargo add abd_clam@0.21.6.

Here is a simple example of how to use CLAM to perform nearest neighbors search:

use symagen::random_data;

use abd_clam::{knn, rnn, Cakes, PartitionCriteria, VecDataset};

/// Euclidean distance function.
/// This function is used to compute the distance between two points for the purposes
/// of this demo. You can use your own distance function instead. The required
/// signature is `fn(T, T) -> U` where `T` is the type of the points (must
/// implement `Send`, `Sync` and `Copy`) and `U` is a `Number` type (e.g. `f32`)
/// from the `distances` crate.
fn euclidean(x: &[f32], y: &[f32]) -> f32 {
        .map(|(a, b)| a - b)
        .map(|v| v * v)

// Some parameters for generating random data.
let seed = 42;
let (cardinality, dimensionality) = (1_000, 10);
let (min_val, max_val) = (-1., 1.);

/// Generate some random data. You can use your own data here.
let data: Vec<Vec<f32>> = random_data::random_f32(cardinality, dimensionality, min_val, max_val, seed);

// We will use the first point in data as our query, and we will perform
// RNN search with a radius of 0.05 and KNN search for the 10 nearest neighbors.
let query: Vec<f32> = data[0].clone();
let radius: f32 = 0.05;
let k = 10;

// We need the contents of data to be &[f32] instead of Vec<f32>. We will rectify this
// in CLAM by extending the trait bounds of some types in CLAM.
let data: Vec<&[f32]> = data.iter().map(Vec::as_slice).collect::<Vec<_>>();

let name = "demo".to_string();  // The name of the dataset.
let is_metric_expensive = false;  // We will assume that our distance function is cheap to compute.

// The metric function itself will be given to Cakes.
let data = VecDataset::new(name, data, euclidean, is_metric_expensive);

// We will use the default partition criteria for this example. This will partition
// the data until each Cluster contains a single unique point.
let criteria = PartitionCriteria::default();

// The Cakes struct provides the functionality described in the CHESS paper.
// This line performs a non-trivial amount of work.
let model = Cakes::new(data, Some(seed), criteria);

// We will soon add the ability to save and load models, but for now we will
// just use the model we just created.

// We can now perform RNN search on the model.
let rnn_results: Vec<(usize, f32)> = model.rnn_search(&query, radius, rnn::Algorithm::Clustered);

// We can also perform KNN search on the model.
let knn_results: Vec<(usize, f32)> = model.knn_search(&query, k, knn::Algorithm::RepeatedRnn);
assert!(knn_results.len() >= k);

// Both results are a Vec of 2-tuples where the first element is the index of the point
// in the dataset and the second element is the distance from the query point.


  • MIT





~154K SLoC