### 3 unstable releases

0.3.0 | Jun 13, 2024 |
---|---|

0.2.3 | Jun 13, 2024 |

0.2.2 | Jun 13, 2024 |

#**1292** in Algorithms

**46** downloads per month

**Apache-2.0**

91KB

1.5K
SLoC

# kmeans_smid

kmeans_smid is a small and fast library for k-means clustering calculations. It fixes smid problem from kmeans crate. Here is a small example, using kmean++ as initialization method and lloyd as k-means variant:

`use` `kmeans_smid``::``*``;`
`fn` `main``(``)`` ``{`
`let` `(`sample_cnt`,` sample_dims`,` k`,` max_iter`)` `=` `(``20000``,` `200``,` `4``,` `100``)``;`
`//` Generate some random data
`let` `mut` samples `=` `vec!``[``0.``0``f64``;`sample_cnt `*` sample_dims`]``;`
samples`.``iter_mut``(``)``.``for_each``(``|``v``|` `*`v `=` `rand``::`random`(``)``)``;`
`//` Calculate kmeans, using kmean++ as initialization-method
`let` kmean `=` `KMeans``<``f64`, 8`>``::`new`(`samples`,` sample_cnt`,` sample_dims`)``;`
`let` result `=` kmean`.``kmeans_lloyd``(`k`,` max_iter`,` `KMeans``::`init_kmeanplusplus`,` `&``KMeansConfig``::`default`(``)``)``;`
`println!``(``"`Centroids: `{:?}``"``,` result`.`centroids`)``;`
`println!``(``"`Cluster-Assignments: `{:?}``"``,` result`.`assignments`)``;`
`println!``(``"`Error: `{}``"``,` result`.`distsum`)``;`
`}`

## Datastructures

For performance-reasons, all calculations are done on bare vectors, using hand-written SIMD intrinsics from the

crate. All vectors are stored row-major, so each sample is stored in a consecutive block of memory.`packed_simd`

## Supported variants / algorithms

- lloyd (standard kmeans)
- minibatch

## Supported centroid initialization methods

- KMean++
- random partition
- random sample

#### Dependencies

~2MB

~40K SLoC