kmeans
A C++ library for k-means
Loading...
Searching...
No Matches
Public Member Functions | List of all members
kmeans::RefineHartiganWong< Matrix_, Cluster_, Float_ > Class Template Reference

Implements the Hartigan-Wong algorithm for k-means clustering. More...

#include <RefineHartiganWong.hpp>

Inheritance diagram for kmeans::RefineHartiganWong< Matrix_, Cluster_, Float_ >:
Inheritance graph
[legend]
Collaboration diagram for kmeans::RefineHartiganWong< Matrix_, Cluster_, Float_ >:
Collaboration graph
[legend]

Public Member Functions

 RefineHartiganWong (RefineHartiganWongOptions options)
 
 RefineHartiganWong ()=default
 
RefineHartiganWongOptionsget_options ()
 
Details< Index_ > run (const Matrix_ &data, Cluster_ ncenters, Float_ *centers, Cluster_ *clusters) const
 

Detailed Description

template<typename Matrix_ = SimpleMatrix<double, int>, typename Cluster_ = int, typename Float_ = double>
class kmeans::RefineHartiganWong< Matrix_, Cluster_, Float_ >

Implements the Hartigan-Wong algorithm for k-means clustering.

The Hartigan-Wong algorithm performs several iterations of transferring points between clusters, involving a computationally expensive "optimal transfer" step that checks each observation against each cluster to determine its best assignment, followed by a cheaper "quick transfer" step, which iterates between the best and second-best cluster choices for each observation. The latter enables rapid exploration of the local solution space without the unnecessary expense of repeatedly comparing each observation to all clusters. The choice of "best" cluster for each observation considers the gain/loss in the sum of squares when an observation moves between clusters, even accounting for the shift in the cluster centers after the transfer. The algorithm terminates when no observation wishes to transfer between clusters.

This implementation is derived from the Fortran code underlying the kmeans function in the stats R package, which in turn is derived from Hartigan and Wong (1979).

In the Details::status returned by run(), the status code is either 0 (success), 2 (maximum optimal transfer iterations reached without convergence) or 4 (maximum quick transfer iterations reached without convergence, if RefineHartiganWongOptions::quit_on_quick_transfer_convergence_failure = true). Previous versions of the library would report a status code of 1 upon encountering an empty cluster, but these are now just ignored.

Template Parameters
Matrix_Matrix type for the input data. This should satisfy the MockMatrix contract.
Cluster_Integer type for the cluster assignments.
Float_Floating-point type for the centroids.
See also
Hartigan, J. A. and Wong, M. A. (1979). Algorithm AS 136: A K-means clustering algorithm. Applied Statistics, 28, 100-108.

Constructor & Destructor Documentation

◆ RefineHartiganWong() [1/2]

template<typename Matrix_ = SimpleMatrix<double, int>, typename Cluster_ = int, typename Float_ = double>
kmeans::RefineHartiganWong< Matrix_, Cluster_, Float_ >::RefineHartiganWong ( RefineHartiganWongOptions  options)
inline
Parameters
optionsFurther options for the Hartigan-Wong algorithm.

◆ RefineHartiganWong() [2/2]

template<typename Matrix_ = SimpleMatrix<double, int>, typename Cluster_ = int, typename Float_ = double>
kmeans::RefineHartiganWong< Matrix_, Cluster_, Float_ >::RefineHartiganWong ( )
default

Default constructor.

Member Function Documentation

◆ get_options()

template<typename Matrix_ = SimpleMatrix<double, int>, typename Cluster_ = int, typename Float_ = double>
RefineHartiganWongOptions & kmeans::RefineHartiganWong< Matrix_, Cluster_, Float_ >::get_options ( )
inline
Returns
Options for Hartigan-Wong clustering, to be modified prior to calling run().

◆ run()

template<typename Matrix_ = SimpleMatrix<double, int>, typename Cluster_ = int, typename Float_ = double>
Details< Index_ > kmeans::RefineHartiganWong< Matrix_, Cluster_, Float_ >::run ( const Matrix_ data,
Cluster_  num_centers,
Float_ centers,
Cluster_ clusters 
) const
inlinevirtual
Parameters
dataA matrix-like object (see MockMatrix) containing per-observation data.
num_centersNumber of cluster centers.
[in,out]centersPointer to an array of length equal to the product of num_centers and data.num_dimensions(). This contains a column-major matrix where rows correspond to dimensions and columns correspond to cluster centers. On input, each column should contain the initial centroid location for its cluster. On output, each column will contain the final centroid locations for each cluster.
[out]clustersPointer to an array of length equal to the number of observations (from data.num_observations()). On output, this will contain the cluster assignment for each observation.
Returns
centers and clusters are filled, and a Details object is returned containing clustering statistics. If num_centers is greater than data.num_observations(), only the first data.num_observations() columns of the centers array will be filled.

Implements kmeans::Refine< Matrix_, Cluster_, Float_ >.


The documentation for this class was generated from the following file: