kmeans
A C++ library for k-means
Loading...
Searching...
No Matches
Classes | Functions
kmeans Namespace Reference

Namespace for k-means clustering. More...

Classes

struct  Details
 Additional statistics from the k-means algorithm. More...
 
class  Initialize
 Base class for initialization algorithms. More...
 
class  InitializeKmeanspp
 k-means++ initialization of Arthur and Vassilvitskii (2007). More...
 
struct  InitializeKmeansppOptions
 Options for k-means++ initialization. More...
 
class  InitializeNone
 No-op "initialization" with existing cluster centers. More...
 
class  InitializeRandom
 Initialize by sampling random observations without replacement. More...
 
struct  InitializeRandomOptions
 Options to use for InitializeRandom. More...
 
class  InitializeVariancePartition
 Implements the variance partitioning method of Su and Dy (2007). More...
 
struct  InitializeVariancePartitionOptions
 Options for InitializeVariancePartition. More...
 
class  MockMatrix
 Compile-time interface for matrix data. More...
 
class  Refine
 Interface for all k-means refinement algorithms. More...
 
class  RefineHartiganWong
 Implements the Hartigan-Wong algorithm for k-means clustering. More...
 
struct  RefineHartiganWongOptions
 Options for RefineHartiganWong. More...
 
class  RefineLloyd
 Implements the Lloyd algorithm for k-means clustering. More...
 
struct  RefineLloydOptions
 Options for RefineLloyd construction. More...
 
class  RefineMiniBatch
 Implements the mini-batch algorithm for k-means clustering. More...
 
struct  RefineMiniBatchOptions
 Options for RefineMiniBatch construction. More...
 
struct  Results
 Full statistics from k-means clustering. More...
 
class  SimpleMatrix
 A simple matrix of observations. More...
 

Functions

template<class Matrix_ , typename Cluster_ , typename Float_ >
void compute_wcss (const Matrix_ &data, Cluster_ ncenters, const Float_ *centers, const Cluster_ *clusters, Float_ *wcss)
 
template<class Matrix_ , typename Cluster_ , typename Float_ >
Details< typename Matrix_::index_type > compute (const Matrix_ &data, const Initialize< Matrix_, Cluster_, Float_ > &initialize, const Refine< Matrix_, Cluster_, Float_ > &refine, Cluster_ num_centers, Float_ *centers, Cluster_ *clusters)
 
template<class Matrix_ , typename Cluster_ , typename Float_ >
Results< Cluster_, Float_, typename Matrix_::index_type > compute (const Matrix_ &data, const Initialize< Matrix_, Cluster_, Float_ > &initialize, const Refine< Matrix_, Cluster_, Float_ > &refine, Cluster_ num_centers)
 
template<typename Task_ , class Run_ >
void parallelize (int num_workers, Task_ num_tasks, Run_ run_task_range)
 

Detailed Description

Namespace for k-means clustering.

Function Documentation

◆ compute() [1/2]

Results< Cluster_, Float_, typename Matrix_::index_type > kmeans::compute ( const Matrix_ data,
const Initialize< Matrix_, Cluster_, Float_ > &  initialize,
const Refine< Matrix_, Cluster_, Float_ > &  refine,
Cluster_  num_centers 
)

Overload that allocates the output vectors.

Template Parameters
Matrix_Matrix type for the input data. This should satisfy the MockMatrix contract.
Cluster_Integer type for the cluster assignments.
Float_Floating-point type for the centroids.
Parameters
dataA matrix-like object (see MockMatrix) containing per-observation data.
initializeInitialization method to use.
refineRefinement method to use.
num_centersNumber of cluster centers.
Returns
Results of the clustering, including the centroid locations and cluster assignments.

◆ compute() [2/2]

Details< typename Matrix_::index_type > kmeans::compute ( const Matrix_ data,
const Initialize< Matrix_, Cluster_, Float_ > &  initialize,
const Refine< Matrix_, Cluster_, Float_ > &  refine,
Cluster_  num_centers,
Float_ centers,
Cluster_ clusters 
)
Template Parameters
Matrix_Matrix type for the input data. This should satisfy the MockMatrix contract.
Cluster_Integer type for the cluster assignments.
Float_Floating-point type for the centroids.
Parameters
dataA matrix-like object (see MockMatrix) containing per-observation data.
initializeInitialization method to use.
refineRefinement method to use.
num_centersNumber of cluster centers.
[out]centersPointer to an array of length equal to the product of num_centers and data.num_dimensions(). This contains a column-major matrix where rows correspond to dimensions and columns correspond to cluster centers. On output, each column should contain the initial centroid location for its cluster.
[in]clustersPointer to an array of length equal to the number of observations (from data.num_observations()). On output, this will contain the 0-based cluster assignment for each observation.

◆ compute_wcss()

void kmeans::compute_wcss ( const Matrix_ data,
Cluster_  ncenters,
const Float_ centers,
const Cluster_ clusters,
Float_ wcss 
)
Template Parameters
Matrix_Matrix type for the input data, satisfying the MockMatrix contract.
Cluster_Integer type for the cluster assignments.
Float_Floating-point type for the centers and output.
Parameters
dataA matrix-like object (see MockMatrix) containing per-observation data.
ncentersNumber of cluster centers.
[in]centersPointer to an array of length equal to the product of num_centers and data.num_dimensions(). This contains a column-major matrix where rows correspond to dimensions and columns correspond to cluster centers. Each column should contain the initial centroid location for its cluster.
[in]clustersPointer to an array of length equal to the number of observations (from data.num_observations()). This should contain the 0-based cluster assignment for each observation.
[out]wcssPointer to an array of length equal to the number of cluster centers. On output, this will contain the within-cluster sum of squares.

◆ parallelize()

template<typename Task_ , class Run_ >
void kmeans::parallelize ( int  num_workers,
Task_  num_tasks,
Run_  run_task_range 
)
Template Parameters
Task_Integer type for the number of tasks.
Run_Function to execute a range of tasks.
Parameters
num_workersNumber of workers.
num_tasksNumber of tasks.
run_task_rangeFunction to iterate over a range of tasks within a worker.

By default, this is an alias to subpar::parallelize_range(). However, if the KMEANS_CUSTOM_PARALLEL function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range().