Namespace for k-means clustering. More...

Classes
class	ConsecutiveAccessExtractor
	Extractor for accessing consecutive observations. More...

struct	Details
	Additional statistics from the k-means algorithm. More...

class	IndexedAccessExtractor
	Extractor for accessing indexed observations. More...

class	Initialize
	Interface for k-means initialization algorithms. More...

class	InitializeKmeanspp
	k-means++ initialization of Arthur and Vassilvitskii (2007). More...

struct	InitializeKmeansppOptions
	Options for k-means++ initialization. More...

class	InitializeNone
	No-op "initialization" with existing cluster centers. More...

class	InitializeRandom
	Initialize by sampling random observations without replacement. More...

struct	InitializeRandomOptions
	Options to use for `InitializeRandom`. More...

class	InitializeVariancePartition
	Implements the variance partitioning method of Su and Dy (2007). More...

struct	InitializeVariancePartitionOptions
	Options for `InitializeVariancePartition`. More...

class	Matrix
	Interface for matrix data. More...

class	RandomAccessExtractor
	Extractor for accessing random observations. More...

class	Refine
	Interface for k-means refinement algorithms. More...

class	RefineHartiganWong
	Implements the Hartigan-Wong algorithm for k-means clustering. More...

struct	RefineHartiganWongOptions
	Options for `RefineHartiganWong`. More...

class	RefineLloyd
	Implements the Lloyd algorithm for k-means clustering. More...

struct	RefineLloydOptions
	Options for `RefineLloyd` construction. More...

class	RefineMiniBatch
	Implements the mini-batch algorithm for k-means clustering. More...

struct	RefineMiniBatchOptions
	Options for `RefineMiniBatch` construction. More...

struct	Results
	Full statistics from k-means clustering. More...

class	SimpleMatrix
	A simple matrix of observations. More...

Functions
template<class Matrix_ , typename Cluster_ , typename Float_ >
void	compute_wcss (const Matrix_ &data, Cluster_ ncenters, const Float_ centers, const Cluster_ clusters, Float_ *wcss)

template<typename Index_ , typename Data_ , typename Cluster_ , typename Float_ , class Matrix_ = Matrix<Index_, Data_>>
Details< Index_ >	compute (const Matrix_ &data, const Initialize< Index_, Data_, Cluster_, Float_, Matrix_ > &initialize, const Refine< Index_, Data_, Cluster_, Float_, Matrix_ > &refine, Cluster_ num_centers, Float_ centers, Cluster_ clusters)

template<typename Index_ , typename Data_ , typename Cluster_ , typename Float_ >
Details< Index_ >	compute (const Matrix< Index_, Data_ > &data, const Initialize< Index_, Data_, Cluster_, Float_, Matrix< Index_, Data_ > > &initialize, const Refine< Index_, Data_, Cluster_, Float_, Matrix< Index_, Data_ > > &refine, Cluster_ num_centers, Float_ centers, Cluster_ clusters)

template<typename Index_ , typename Data_ , typename Cluster_ , typename Float_ , class Matrix_ = Matrix<Index_, Data_>>
Results< Index_, Cluster_, Float_ >	compute (const Matrix_ &data, const Initialize< Index_, Data_, Cluster_, Float_, Matrix_ > &initialize, const Refine< Index_, Data_, Cluster_, Float_, Matrix_ > &refine, Cluster_ num_centers)

template<typename Index_ , typename Data_ , typename Cluster_ , typename Float_ >
Results< Index_, Cluster_, Float_ >	compute (const Matrix< Index_, Data_ > &data, const Initialize< Index_, Data_, Cluster_, Float_, Matrix< Index_, Data_ > > &initialize, const Refine< Index_, Data_, Cluster_, Float_, Matrix< Index_, Data_ > > &refine, Cluster_ num_centers)

template<typename Task_ , class Run_ >
void	parallelize (int num_workers, Task_ num_tasks, Run_ run_task_range)

Detailed Description

Namespace for k-means clustering.

Function Documentation

◆ compute() [1/4]

template<typename Index_ , typename Data_ , typename Cluster_ , typename Float_ >

Results< Index_, Cluster_, Float_ > kmeans::compute	(	const Matrix< Index_, Data_ > &	data,
		const Initialize< Index_, Data_, Cluster_, Float_, Matrix< Index_, Data_ > > &	initialize,
		const Refine< Index_, Data_, Cluster_, Float_, Matrix< Index_, Data_ > > &	refine,
		Cluster_	num_centers )

Overload of compute() to assist template deduction for the default Matrix. This allocates and returns the vectors for the centroids and cluster assignments.

Template Parameters

Index_	Integer type for the observation indices in the input dataset.
Data_	Numeric type for the input dataset.
Cluster_	Integer type for the cluster assignments.
Float_	Floating-point type for the centroids. This will also be used for any internal distance calculations.

Parameters

data	A matrix-like object containing per-observation data.
initialize	Initialization method to use.
refine	Refinement method to use.
num_centers	Number of cluster centers.

Returns: Results of the clustering, including the centroid locations and cluster assignments.

◆ compute() [2/4]

template<typename Index_ , typename Data_ , typename Cluster_ , typename Float_ >

Details< Index_ > kmeans::compute	(	const Matrix< Index_, Data_ > &	data,
		const Initialize< Index_, Data_, Cluster_, Float_, Matrix< Index_, Data_ > > &	initialize,
		const Refine< Index_, Data_, Cluster_, Float_, Matrix< Index_, Data_ > > &	refine,
		Cluster_	num_centers,
		Float_ *	centers,
		Cluster_ *	clusters )

Overload of compute() to assist template deduction for the default Matrix.

Template Parameters

Index_	Integer type for the observation indices in the input dataset.
Data_	Numeric type for the input dataset.
Cluster_	Integer type for the cluster assignments.
Float_	Floating-point type for the centroids. This will also be used for any internal distance calculations.

Parameters

	data	A matrix-like object containing per-observation data.
	initialize	Initialization method to use.
	refine	Refinement method to use.
	num_centers	Number of cluster centers.
[out]	centers	Pointer to an array of length equal to the product of `num_centers` and `data.num_dimensions()`. This contains a column-major matrix where rows correspond to dimensions and columns correspond to cluster centers. On output, each column should contain the initial centroid location for its cluster.
[in]	clusters	Pointer to an array of length equal to the number of observations (from `data.num_observations()`). On output, this will contain the 0-based cluster assignment for each observation.

◆ compute() [3/4]

template<typename Index_ , typename Data_ , typename Cluster_ , typename Float_ , class Matrix_ = Matrix<Index_, Data_>>

Results< Index_, Cluster_, Float_ > kmeans::compute	(	const Matrix_ &	data,
		const Initialize< Index_, Data_, Cluster_, Float_, Matrix_ > &	initialize,
		const Refine< Index_, Data_, Cluster_, Float_, Matrix_ > &	refine,
		Cluster_	num_centers )

Overload of compute() that allocates and returns the vectors for the centroids and cluster assignments.

Template Parameters

Index_	Integer type for the observation indices in the input dataset.
Data_	Numeric type for the input dataset.
Cluster_	Integer type for the cluster assignments.
Float_	Floating-point type for the centroids. This will also be used for any internal distance calculations.
Matrix_	Class of the input data matrix. This should satisfy the `Matrix` interface.

Parameters

data	A matrix-like object containing per-observation data.
initialize	Initialization method to use.
refine	Refinement method to use.
num_centers	Number of cluster centers.

Returns: Results of the clustering, including the centroid locations and cluster assignments.

◆ compute() [4/4]

template<typename Index_ , typename Data_ , typename Cluster_ , typename Float_ , class Matrix_ = Matrix<Index_, Data_>>

Details< Index_ > kmeans::compute	(	const Matrix_ &	data,
		const Initialize< Index_, Data_, Cluster_, Float_, Matrix_ > &	initialize,
		const Refine< Index_, Data_, Cluster_, Float_, Matrix_ > &	refine,
		Cluster_	num_centers,
		Float_ *	centers,
		Cluster_ *	clusters )

Template Parameters

Index_	Integer type for the observation indices in the input dataset.
Data_	Numeric type for the input dataset.
Cluster_	Integer type for the cluster assignments.
Float_	Floating-point type for the centroids. This will also be used for any internal distance calculations.
Matrix_	Class of the input data matrix. This should satisfy the `Matrix` interface.

Parameters

	data	A matrix-like object containing per-observation data.
	initialize	Initialization method to use.
	refine	Refinement method to use.
	num_centers	Number of cluster centers.
[out]	centers	Pointer to an array of length equal to the product of `num_centers` and `data.num_dimensions()`. This contains a column-major matrix where rows correspond to dimensions and columns correspond to cluster centers. On output, each column should contain the initial centroid location for its cluster.
[in]	clusters	Pointer to an array of length equal to the number of observations (from `data.num_observations()`). On output, this will contain the 0-based cluster assignment for each observation.

◆ compute_wcss()

template<class Matrix_ , typename Cluster_ , typename Float_ >

void kmeans::compute_wcss	(	const Matrix_ &	data,
		Cluster_	ncenters,
		const Float_ *	centers,
		const Cluster_ *	clusters,
		Float_ *	wcss )

Template Parameters

Matrix_	Matrix type for the input data, satisfying the `MockMatrix` contract.
Cluster_	Integer type for the cluster assignments.
Float_	Floating-point type for the centers and output.

Parameters

	data	A matrix-like object (see `MockMatrix`) containing per-observation data.
	ncenters	Number of cluster centers.
[in]	centers	Pointer to an array of length equal to the product of `num_centers` and `data.num_dimensions()`. This contains a column-major matrix where rows correspond to dimensions and columns correspond to cluster centers. Each column should contain the initial centroid location for its cluster.
[in]	clusters	Pointer to an array of length equal to the number of observations (from `data.num_observations()`). This should contain the 0-based cluster assignment for each observation.
[out]	wcss	Pointer to an array of length equal to the number of cluster centers. On output, this will contain the within-cluster sum of squares.

◆ parallelize()

template<typename Task_ , class Run_ >

void kmeans::parallelize	(	int	num_workers,
		Task_	num_tasks,
		Run_	run_task_range )

Template Parameters

Task_	Integer type for the number of tasks.
Run_	Function to execute a range of tasks.

Parameters

num_workers	Number of workers.
num_tasks	Number of tasks.
run_task_range	Function to iterate over a range of tasks within a worker.

By default, this is an alias to subpar::parallelize_range(). However, if the KMEANS_CUSTOM_PARALLEL function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range().

Classes

Functions

Detailed Description

Function Documentation

◆ compute() [1/4]

◆ compute() [2/4]

◆ compute() [3/4]

◆ compute() [4/4]

◆ compute_wcss()

◆ parallelize()