Implements the Hartigan-Wong algorithm for k-means clustering.
The Hartigan-Wong algorithm performs several iterations of transferring points between clusters, involving a computationally expensive "optimal transfer" step that checks each observation against each cluster to determine its best assignment, followed by a cheaper "quick transfer" step, which iterates between the best and second-best cluster choices for each observation. The latter enables rapid exploration of the local solution space without the unnecessary expense of repeatedly comparing each observation to all clusters. The choice of "best" cluster for each observation considers the gain/loss in the sum of squares when an observation moves between clusters, even accounting for the shift in the cluster centers after the transfer. The algorithm terminates when no observation wishes to transfer between clusters.
This implementation is derived from the Fortran code underlying the kmeans
function in the stats R package, which in turn is derived from Hartigan and Wong (1979).
In the Details::status
returned by run()
, the status code is either 0 (success), 2 (maximum optimal transfer iterations reached without convergence) or 4 (maximum quick transfer iterations reached without convergence, if RefineHartiganWongOptions::quit_on_quick_transfer_convergence_failure = true
). Previous versions of the library would report a status code of 1 upon encountering an empty cluster, but these are now just ignored.
- Template Parameters
-
Matrix_ | Matrix type for the input data. This should satisfy the MockMatrix contract. |
Cluster_ | Integer type for the cluster assignments. |
Float_ | Floating-point type for the centroids. |
- See also
- Hartigan, J. A. and Wong, M. A. (1979). Algorithm AS 136: A K-means clustering algorithm. Applied Statistics, 28, 100-108.