kmeans
A C++ library for k-means
|
Implements the Lloyd algorithm for k-means clustering. More...
#include <RefineLloyd.hpp>
Public Member Functions | |
RefineLloyd (RefineLloydOptions options) | |
RefineLloyd ()=default | |
RefineLloydOptions & | get_options () |
Details< Index_ > | run (const Matrix_ &data, Cluster_ ncenters, Float_ *centers, Cluster_ *clusters) const |
Implements the Lloyd algorithm for k-means clustering.
The Lloyd algorithm is the simplest k-means clustering algorithm, involving several iterations of batch assignments and center calculations. Specifically, we assign each observation to its closest cluster, and once all points are assigned, we recompute the cluster centroids. This is repeated until there are no reassignments or the maximum number of iterations is reached.
In the Details::status
returned by run()
, the status code is either 0 (success) or 2 (maximum iterations reached without convergence). Previous versions of the library would report a status code of 1 upon encountering an empty cluster, but these are now just ignored.
Matrix_ | Matrix type for the input data. This should satisfy the MockMatrix contract. |
Cluster_ | Integer type for the cluster assignments. |
Float_ | Floating-point type for the centroids. |
|
inline |
options | Further options to the Lloyd algorithm. |
|
default |
Default constructor.
|
inline |
run()
.
|
inlinevirtual |
data | A matrix-like object (see MockMatrix ) containing per-observation data. | |
num_centers | Number of cluster centers. | |
[in,out] | centers | Pointer to an array of length equal to the product of num_centers and data.num_dimensions() . This contains a column-major matrix where rows correspond to dimensions and columns correspond to cluster centers. On input, each column should contain the initial centroid location for its cluster. On output, each column will contain the final centroid locations for each cluster. |
[out] | clusters | Pointer to an array of length equal to the number of observations (from data.num_observations() ). On output, this will contain the cluster assignment for each observation. |
centers
and clusters
are filled, and a Details
object is returned containing clustering statistics. If num_centers
is greater than data.num_observations()
, only the first data.num_observations()
columns of the centers
array will be filled. Implements kmeans::Refine< Matrix_, Cluster_, Float_ >.