R: K-means algorithm for the clustering of variables

CLV_kmeans {ClustVarLV}

R Documentation

K-means algorithm for the clustering of variables

Description

K-means algorithm for the clustering of variables. Directional or local groups may be defined. Each group of variables is associated with a latent component. Moreover external information collected on the observations or on the variables may be introduced.

Usage

CLV_kmeans(
  X,
  Xu = NULL,
  Xr = NULL,
  method,
  sX = TRUE,
  sXr = FALSE,
  sXu = FALSE,
  clust,
  iter.max = 20,
  nstart = 100,
  strategy = "none",
  rho = 0.3
)

Arguments

`X`	The matrix of the variables to be clustered
`Xu`	The external variables associated with the columns of X
`Xr`	The external variables associated with the rows of X
`method`	The criterion to use in the cluster analysis. 1 or "directional" : the squared covariance is used as a measure of proximity (directional groups). 2 or "local" : the covariance is used as a measure of proximity (local groups)
`sX`	TRUE/FALSE : standardization or not of the columns X (TRUE by default) (predefined -> cX = TRUE : column-centering of X)
`sXr`	TRUE/FALSE : standardization or not of the columns Xr (FALSE by default) (predefined -> cXr = TRUE : column-centering of Xr)
`sXu`	TRUE/FALSE : standardization or not of the columns Xu (FALSE by default) (predefined -> cXu= FALSE : no centering, Xu considered as a weight matrix)
`clust`	: a number i.e. the size of the partition, K, or a vector of INTEGERS i.e. the group membership of each variable in the initial partition (integer between 1 and K)
`iter.max`	maximal number of iteration for the consolidation (20 by default)
`nstart`	nb of random initialisations in the case where init is a number (100 by default)
`strategy`	"none" (by default), or "kplusone" (an additional cluster for the noise variables), or "sparselv" (zero loadings for the noise variables)
`rho`	a threshold of correlation between 0 and 1 (0.3 by default)

Details

The initalization can be made at random, repetitively, or can be defined by the user.

The parameter "strategy" makes it possible to choose a strategy for setting aside variables that do not fit into the pattern of any cluster.

Value

`tabres`	The value of the clustering criterion at convergence. The percentage of the explained initial criterion value. The number of iterations in the partitioning algorithm.
`clusters`	the group's membership
`comp`	The latent components of the clusters
`loading`	if there are external variables Xr or Xu : The loadings of the external variables

References

Vigneau E., Qannari E.M. (2003). Clustering of variables around latents components. Comm. Stat, 32(4), 1131-1150.

Vigneau E., Chen M., Qannari E.M. (2015). ClustVarLV: An R Package for the clustering of Variables around Latent Variables. The R Journal, 7(2), 134-148

Vigneau E., Chen M. (2016). Dimensionality reduction by clustering of variables while setting aside atypical variables. Electronic Journal of Applied Statistical Analysis, 9(1), 134-153

Examples

data(apples_sh)
#local groups with external variables Xr 
resclvkmYX <- CLV_kmeans(X = apples_sh$pref, Xr = apples_sh$senso,method = "local",
          sX = FALSE, sXr = TRUE, clust = 2, nstart = 20)

[Package ClustVarLV version 2.1.1 Index]