Implements the mini-batch algorithm for k-means clustering. More...

#include <RefineMiniBatch.hpp>

Inheritance diagram for kmeans::RefineMiniBatch< Index_, Data_, Cluster_, Float_, Matrix_ >:

Collaboration diagram for kmeans::RefineMiniBatch< Index_, Data_, Cluster_, Float_, Matrix_ >:

Public Member Functions
	RefineMiniBatch (RefineMiniBatchOptions options)

	RefineMiniBatch ()=default

RefineMiniBatchOptions &	get_options ()

Public Member Functions inherited from kmeans::Refine< Index_, Data_, Cluster_, Float_, Matrix_ >
virtual Details< Index_ >	run (const Matrix_ &data, Cluster_ num_centers, Float_ centers, Cluster_ clusters) const =0

Detailed Description

template<typename Index_, typename Data_, typename Cluster_, typename Float_, typename Matrix_ = Matrix<Index_, Data_>>
class kmeans::RefineMiniBatch< Index_, Data_, Cluster_, Float_, Matrix_ >

Implements the mini-batch algorithm for k-means clustering.

The mini-batch approach is similar to Lloyd's algorithm in that it runs through a set of observations, assigns each to the closest centroid, updates the centroids and repeats. The key difference is that each iteration is performed with a random subset of observations (i.e., a "mini-batch"), instead of the full set of observations. This reduces computational time and memory usage at the cost of some solution quality.

The update procedure for a cluster's centroid involves adjusting the coordinates by the assigned observations in the mini-batch. The resulting vector can be interpreted as the mean of all observations that have ever been sampled (possibly multiple times) to that cluster. Thus, the magnitude of the updates will decrease in later iterations as the relative effect of newly sampled points is reduced. This ensures that the centroids will stabilize at a sufficiently large number of iterations.

We may stop the algorithm before the maximum number of iterations if only a few observations are reassigned at each iteration. Specifically, every \(h\) iterations, we compute the proportion of sampled observations for each cluster in the past \(h\) mini-batches that were reassigned to/from that cluster. If this proportion is less than some threshold \(p\) for all clusters, we consider that the algorithm has converged.

In the Details::status returned by run(), the status code is either 0 (success) or 2 (maximum iterations reached without convergence). Previous versions of the library would report a status code of 1 upon encountering an empty cluster, but these are now just ignored.

Template Parameters

Index_	Integer type for the observation indices.
Data_	Numeric type for the data.
Cluster_	Integer type for the cluster assignments.
Float_	Floating-point type for the centroids. This will also be used for any internal distance calculations.
Matrix_	Type for the input data matrix. This should satisfy the `Matrix` interface.

Constructor & Destructor Documentation

◆ RefineMiniBatch() [1/2]

template<typename Index_ , typename Data_ , typename Cluster_ , typename Float_ , typename Matrix_ = Matrix<Index_, Data_>>

kmeans::RefineMiniBatch< Index_, Data_, Cluster_, Float_, Matrix_ >::RefineMiniBatch ( RefineMiniBatchOptions options )

inline

Parameters

options Further options for the mini-batch algorithm.

◆ RefineMiniBatch() [2/2]

template<typename Index_ , typename Data_ , typename Cluster_ , typename Float_ , typename Matrix_ = Matrix<Index_, Data_>>

kmeans::RefineMiniBatch< Index_, Data_, Cluster_, Float_, Matrix_ >::RefineMiniBatch ( )

default

Default constructor.

Member Function Documentation

◆ get_options()

template<typename Index_ , typename Data_ , typename Cluster_ , typename Float_ , typename Matrix_ = Matrix<Index_, Data_>>

RefineMiniBatchOptions & kmeans::RefineMiniBatch< Index_, Data_, Cluster_, Float_, Matrix_ >::get_options ( )

inline

Returns: Options for mini-batch partitioning, to be modified prior to calling run().

The documentation for this class was generated from the following file:

kmeans/RefineMiniBatch.hpp

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ RefineMiniBatch() [1/2]

◆ RefineMiniBatch() [2/2]

Member Function Documentation

◆ get_options()