kmeans
A C++ library for k-means
|
Namespace for k-means clustering. More...
Classes | |
struct | Details |
Additional statistics from the k-means algorithm. More... | |
class | Initialize |
Base class for initialization algorithms. More... | |
class | InitializeKmeanspp |
k-means++ initialization of Arthur and Vassilvitskii (2007). More... | |
struct | InitializeKmeansppOptions |
Options for k-means++ initialization. More... | |
class | InitializeNone |
No-op "initialization" with existing cluster centers. More... | |
class | InitializeRandom |
Initialize by sampling random observations without replacement. More... | |
struct | InitializeRandomOptions |
Options to use for InitializeRandom . More... | |
class | InitializeVariancePartition |
Implements the variance partitioning method of Su and Dy (2007). More... | |
struct | InitializeVariancePartitionOptions |
Options for InitializeVariancePartition . More... | |
class | MockMatrix |
Compile-time interface for matrix data. More... | |
class | Refine |
Interface for all k-means refinement algorithms. More... | |
class | RefineHartiganWong |
Implements the Hartigan-Wong algorithm for k-means clustering. More... | |
struct | RefineHartiganWongOptions |
Options for RefineHartiganWong . More... | |
class | RefineLloyd |
Implements the Lloyd algorithm for k-means clustering. More... | |
struct | RefineLloydOptions |
Options for RefineLloyd construction. More... | |
class | RefineMiniBatch |
Implements the mini-batch algorithm for k-means clustering. More... | |
struct | RefineMiniBatchOptions |
Options for RefineMiniBatch construction. More... | |
struct | Results |
Full statistics from k-means clustering. More... | |
class | SimpleMatrix |
A simple matrix of observations. More... | |
Functions | |
template<class Matrix_ , typename Cluster_ , typename Float_ > | |
void | compute_wcss (const Matrix_ &data, Cluster_ ncenters, const Float_ *centers, const Cluster_ *clusters, Float_ *wcss) |
template<class Matrix_ , typename Cluster_ , typename Float_ > | |
Details< typename Matrix_::index_type > | compute (const Matrix_ &data, const Initialize< Matrix_, Cluster_, Float_ > &initialize, const Refine< Matrix_, Cluster_, Float_ > &refine, Cluster_ num_centers, Float_ *centers, Cluster_ *clusters) |
template<class Matrix_ , typename Cluster_ , typename Float_ > | |
Results< Cluster_, Float_, typename Matrix_::index_type > | compute (const Matrix_ &data, const Initialize< Matrix_, Cluster_, Float_ > &initialize, const Refine< Matrix_, Cluster_, Float_ > &refine, Cluster_ num_centers) |
template<typename Task_ , class Run_ > | |
void | parallelize (int num_workers, Task_ num_tasks, Run_ run_task_range) |
Namespace for k-means clustering.
Results< Cluster_, Float_, typename Matrix_::index_type > kmeans::compute | ( | const Matrix_ & | data, |
const Initialize< Matrix_, Cluster_, Float_ > & | initialize, | ||
const Refine< Matrix_, Cluster_, Float_ > & | refine, | ||
Cluster_ | num_centers | ||
) |
Overload that allocates the output vectors.
Matrix_ | Matrix type for the input data. This should satisfy the MockMatrix contract. |
Cluster_ | Integer type for the cluster assignments. |
Float_ | Floating-point type for the centroids. |
data | A matrix-like object (see MockMatrix ) containing per-observation data. |
initialize | Initialization method to use. |
refine | Refinement method to use. |
num_centers | Number of cluster centers. |
Details< typename Matrix_::index_type > kmeans::compute | ( | const Matrix_ & | data, |
const Initialize< Matrix_, Cluster_, Float_ > & | initialize, | ||
const Refine< Matrix_, Cluster_, Float_ > & | refine, | ||
Cluster_ | num_centers, | ||
Float_ * | centers, | ||
Cluster_ * | clusters | ||
) |
Matrix_ | Matrix type for the input data. This should satisfy the MockMatrix contract. |
Cluster_ | Integer type for the cluster assignments. |
Float_ | Floating-point type for the centroids. |
data | A matrix-like object (see MockMatrix ) containing per-observation data. | |
initialize | Initialization method to use. | |
refine | Refinement method to use. | |
num_centers | Number of cluster centers. | |
[out] | centers | Pointer to an array of length equal to the product of num_centers and data.num_dimensions() . This contains a column-major matrix where rows correspond to dimensions and columns correspond to cluster centers. On output, each column should contain the initial centroid location for its cluster. |
[in] | clusters | Pointer to an array of length equal to the number of observations (from data.num_observations() ). On output, this will contain the 0-based cluster assignment for each observation. |
void kmeans::compute_wcss | ( | const Matrix_ & | data, |
Cluster_ | ncenters, | ||
const Float_ * | centers, | ||
const Cluster_ * | clusters, | ||
Float_ * | wcss | ||
) |
Matrix_ | Matrix type for the input data, satisfying the MockMatrix contract. |
Cluster_ | Integer type for the cluster assignments. |
Float_ | Floating-point type for the centers and output. |
data | A matrix-like object (see MockMatrix ) containing per-observation data. | |
ncenters | Number of cluster centers. | |
[in] | centers | Pointer to an array of length equal to the product of num_centers and data.num_dimensions() . This contains a column-major matrix where rows correspond to dimensions and columns correspond to cluster centers. Each column should contain the initial centroid location for its cluster. |
[in] | clusters | Pointer to an array of length equal to the number of observations (from data.num_observations() ). This should contain the 0-based cluster assignment for each observation. |
[out] | wcss | Pointer to an array of length equal to the number of cluster centers. On output, this will contain the within-cluster sum of squares. |
Task_ | Integer type for the number of tasks. |
Run_ | Function to execute a range of tasks. |
num_workers | Number of workers. |
num_tasks | Number of tasks. |
run_task_range | Function to iterate over a range of tasks within a worker. |
By default, this is an alias to subpar::parallelize_range()
. However, if the KMEANS_CUSTOM_PARALLEL
function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range()
.