HTKmeans {clusterHD}R Documentation

HTK-Means Clustering

Description

Perform HTK-means clustering (Raymaekers and Zamar, 2022) on a data matrix.

Usage

HTKmeans(X, k, lambdas = NULL,
         standardize = TRUE,
         iter.max = 100, nstart = 100,
         nlambdas = 50,
         lambda_max = 1,
         verbose = FALSE)

Arguments

X

a matrix containing the data.

k

the number of clusters.

lambdas

a vector of values for the regularization parameter lambda. Defaults to NULL, which generates a sequence of values automatically.

standardize

logical flag for standardization to mean 0 and variance 1 of the data in X. This is recommended, unless the variance of the variables is known to quantify relevant information.

iter.max

the maximum number of iterations allowed.

nstart

number of starts used when k-means is applied to generate the starting values for HTK-means. See below for more info.

nlambdas

Number of lambda values to generate automatically.

lambda_max

Maximum value for the regularization paramater lambda. If standardize = TRUE, the default of 1 works well.

verbose

Whether or not to print progress. Defaults to FALSE.

Details

The algorithm starts by generating a number of sparse starting values. This is done using k-means on subsets of variables. See Raymaekers and Zamar (2022) for details.

Value

A list with components:

Author(s)

J. Raymaekers and R.H. Zamar

References

Raymaekers, Jakob, and Ruben H. Zamar. "Regularized K-means through hard-thresholding." arXiv preprint arXiv:2010.00950 (2020).

See Also

kmeans

Examples

X <- iris[, 1:4]
HTKmeans.out <- HTKmeans(X, k = 3, lambdas = 0.8)
HTKmeans.out[[1]]$centers
pairs(X, col = HTKmeans.out[[1]]$cluster)

[Package clusterHD version 1.0.2 Index]