kmeans
k-means clustering in C++
Loading...
Searching...
No Matches
kmeans::RefineHartiganWong< Index_, Data_, Cluster_, Float_, Matrix_ > Class Template Referencefinal

Implements the Hartigan-Wong algorithm for k-means clustering. More...

#include <RefineHartiganWong.hpp>

Inheritance diagram for kmeans::RefineHartiganWong< Index_, Data_, Cluster_, Float_, Matrix_ >:
Collaboration diagram for kmeans::RefineHartiganWong< Index_, Data_, Cluster_, Float_, Matrix_ >:

Public Member Functions

 RefineHartiganWong (RefineHartiganWongOptions options)
 
 RefineHartiganWong ()=default
 
RefineHartiganWongOptionsget_options ()
 
- Public Member Functions inherited from kmeans::Refine< Index_, Data_, Cluster_, Float_, Matrix_ >
virtual Details< Index_ > run (const Matrix_ &data, Cluster_ num_centers, Float_ *centers, Cluster_ *clusters) const =0
 

Detailed Description

template<typename Index_, typename Data_, typename Cluster_, typename Float_, class Matrix_ = Matrix<Index_, Data_>>
class kmeans::RefineHartiganWong< Index_, Data_, Cluster_, Float_, Matrix_ >

Implements the Hartigan-Wong algorithm for k-means clustering.

The Hartigan-Wong algorithm performs several iterations of transferring points between clusters, involving a computationally expensive "optimal transfer" step that checks each observation against each cluster to determine its best assignment, followed by a cheaper "quick transfer" step, which iterates between the best and second-best cluster choices for each observation. The latter enables rapid exploration of the local solution space without the unnecessary expense of repeatedly comparing each observation to all clusters. The choice of "best" cluster for each observation considers the gain/loss in the sum of squares when an observation moves between clusters, even accounting for the shift in the cluster centers after the transfer. The algorithm terminates when no observation wishes to transfer between clusters.

This implementation is derived from the Fortran code underlying the kmeans function in the stats R package, which in turn is taken from Hartigan and Wong (1979).

In the Details::status returned by run(), the status code is either 0 (success), 2 (maximum optimal transfer iterations reached without convergence) or 4 (maximum quick transfer iterations reached without convergence, if RefineHartiganWongOptions::quit_on_quick_transfer_convergence_failure = true). Previous versions of the library would report a status code of 1 upon encountering an empty cluster, but these are now just ignored.

In the Details::iterations returned by run(), the reported number of iterations is that of the optimal transfers.

Template Parameters
Index_Integer type of the observation indices. This should be the same as the index type of Matrix_.
Data_Numeric type of the input dataset. This should be the same as the data type of Matrix_.
Cluster_Integer type of the cluster assignments.
Float_Floating-point type of the centroids. This will also be used for the internal distance calculations.
Matrix_Class satisfying the Matrix interface.
See also
Hartigan, J. A. and Wong, M. A. (1979). Algorithm AS 136: A K-means clustering algorithm. Applied Statistics, 28, 100-108.

Constructor & Destructor Documentation

◆ RefineHartiganWong() [1/2]

template<typename Index_ , typename Data_ , typename Cluster_ , typename Float_ , class Matrix_ = Matrix<Index_, Data_>>
kmeans::RefineHartiganWong< Index_, Data_, Cluster_, Float_, Matrix_ >::RefineHartiganWong ( RefineHartiganWongOptions options)
inline
Parameters
optionsFurther options for the Hartigan-Wong algorithm.

◆ RefineHartiganWong() [2/2]

template<typename Index_ , typename Data_ , typename Cluster_ , typename Float_ , class Matrix_ = Matrix<Index_, Data_>>
kmeans::RefineHartiganWong< Index_, Data_, Cluster_, Float_, Matrix_ >::RefineHartiganWong ( )
default

Default constructor.

Member Function Documentation

◆ get_options()

template<typename Index_ , typename Data_ , typename Cluster_ , typename Float_ , class Matrix_ = Matrix<Index_, Data_>>
RefineHartiganWongOptions & kmeans::RefineHartiganWong< Index_, Data_, Cluster_, Float_, Matrix_ >::get_options ( )
inline
Returns
Options for Hartigan-Wong clustering. This can be modified prior to calling run().

The documentation for this class was generated from the following file: