kmeans
A C++ library for k-means
|
k-means++ initialization of Arthur and Vassilvitskii (2007). More...
#include <InitializeKmeanspp.hpp>
Public Member Functions | |
InitializeKmeanspp (InitializeKmeansppOptions options) | |
InitializeKmeanspp ()=default | |
InitializeKmeansppOptions & | get_options () |
Cluster_ | run (const Matrix_ &matrix, Cluster_ ncenters, Float_ *centers) const |
k-means++ initialization of Arthur and Vassilvitskii (2007).
This approach involves the selection of starting points via iterations of weighted sampling, where the sampling probability for each point is proportional to the squared distance to the closest starting point that was chosen in any of the previous iterations. The aim is to obtain well-separated starting points to encourage the formation of suitable clusters.
Matrix_ | Matrix type for the input data. This should satisfy the MockMatrix contract. |
Cluster_ | Integer type for the cluster assignments. |
Float_ | Floating-point type for the centroids. |
|
inline |
options | Options for kmeans++ initialization. |
|
default |
Default constructor.
|
inline |
run()
.
|
inlinevirtual |
data | A matrix-like object (see MockMatrix ) containing per-observation data. | |
num_centers | Number of cluster centers. | |
[out] | centers | Pointer to an array of length equal to the product of num_centers and data.num_dimensions() . This contains a column-major matrix where rows correspond to dimensions and columns correspond to cluster centers. On output, each column will contain the final centroid locations for each cluster. |
centers
is filled with the new cluster centers. The number of filled centers is returned - this is usually equal to num_centers
, but may not be if, e.g., num_centers
is greater than the number of observations. If the returned value is less than num_centers
, only the first few centers in centers
will be filled. Implements kmeans::Initialize< Matrix_, Cluster_, Float_ >.