4 releases
0.2.2 | Jun 19, 2023 |
---|---|
0.2.1 | Jun 3, 2023 |
0.2.0 | Feb 20, 2023 |
0.1.0 | Feb 14, 2023 |
#1036 in Algorithms
35 downloads per month
33KB
640 lines
flame-clustering
Description
Fuzzy clustering by Local Approximation of MEmberships (FLAME) is a data clustering algorithm that defines clusters in the dense parts of a dataset and performs cluster assignment solely based on the neighborhood relationships among objects.
The algorithm was first described in: "FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data", BMC Bioinformatics, 2007, 8:3. Available from: http://www.biomedcentral.com/1471-2105/8/3
This Rust library was adapted from the original C implementation.
Usage Example
The following is a simplified example of how one would use the library to cluster data:
use flame_clustering::DistanceGraph;
let data: Vec<f64> = vec![0.12, 0.23, 0.15, 0.19, 100.0];
let (clusters, outliers) = DistanceGraph::build(&data, |a, b| (a - b).abs())
.find_supporting_objects(2, -1.0)
.approximate_fuzzy_memberships(100, 1e-6)
.make_clusters(-1.0);
assert_eq!(format!("{clusters:?}"), "[[0, 1, 3, 2]]");
assert_eq!(format!("{outliers:?}"), "[4]");
The output is a sequence of indexes into the original dataset. In the example
above, one cluster was identified that contains the numbers
[0.12, 0.15, 0.19, 0.23]
, and 100.0
was identified as an outlier.
See the library documentation for more information about method parameters.
Compatibility Notes
The Rust implementation differs from the original library in the following ways:
- The single
Flame
struct was replaced with a builder-style API. See library documentation for details. - The code was modified to accept user-supplied distance functions.
- The ability to specify a distance matrix directly is not yet available.
- Outliers are no longer automatically assigned to the nearest cluster. By
default outliers remain outliers; in order to preserve the original behavior
one will need to call a new
assign_outliers()
method. - The clustering algorithm uses 64-bit floating point numbers, whereas the original mostly used 32-bit numbers. This results in minor numerical differences when compared to the C implementation.
- The original library provided
Flame_DotProduct
andFlame_DotProductDist
distance methods, but I have not included them here. - There were no tests included with the original implementation. A few unit tests have been added as a development aid when when porting the code, but test coverage is incomplete.
- Minor grammatical corrections have been made to comments in the code.
Dependencies
~1.5MB
~40K SLoC