rlg {tclust}R Documentation

Robust Linear Grouping

Description

The function rlg() searches for clusters around affine subspaces of dimensions given by vector d (the length of that vector is the number of clusters). For instance d=c(1,2) means that we are clustering around a line and a plane. For robustifying the estimation, a proportion alpha of observations is trimmed. In particular, the trimmed k-means method is represented by the rlg method, if d=c(0,0,..0) (a vector of length k with zeroes).

Usage

rlg(
  x,
  d,
  alpha = 0.05,
  nstart = 500,
  niter1 = 3,
  niter2 = 20,
  nkeep = 5,
  scale = FALSE,
  parallel = FALSE,
  n.cores = -1,
  trace = FALSE
)

Arguments

x

A matrix or data.frame of dimension n x p, containing the observations (rowwise).

d

A numeric vector of length equal to the number of clusters to be detected. Each component of vector d indicates the intrinsic dimension of the affine subspace where observations on that cluster are going to be clustered. All the elements of vector d should be smaller than the problem dimension minus 1.

alpha

The proportion of observations to be trimmed.

nstart

The number of random initializations to be performed.

niter1

The number of concentration steps to be performed for the nstart initializations.

niter2

The maximum number of concentration steps to be performed for the nkeep solutions kept for further iteration. The concentration steps are stopped, whenever two consecutive steps lead to the same data partition.

nkeep

The number of iterated initializations (after niter1 concentration steps) with the best values in the target function that are kept for further iterations

scale

A robust centering and scaling (using the median and MAD) is done if TRUE.

parallel

A logical value, specifying whether the nstart initializations should be done in parallel.

n.cores

The number of cores to use when paralellizing, only taken into account if parallel=T.

trace

Defines the tracing level, which is set to 0 by default. Tracing level 1 gives additional information on the stage of the iterative process.

Details

The procedure allows to deal with robust clustering around affine subspaces with an alpha proportion of trimming level by minimizing the trimmed sums of squared orthogonal residuals. Each component of vector d indicates the intrinsic dimension of the affine subspace where observations on that cluster are going to be clustered. Therefore a component equal to 0 on that vector implies clustering around centres, equal to 1 around lines, equal to 2 around planes and so on. The procedure so allows simultaneous clustering and dimensionality reduction.

This iterative algorithm performs "concentration steps" to improve the current cluster assignments. For approximately obtaining the global optimum, the procedure is randomly initialized nstart times and niter1 concentration steps are performed for them. The nkeep most “promising” iterations, i.e. the nkeep iterated solutions with the initial best values for the target function, are then iterated until convergence or until niter2 concentration steps are done.

Value

Returns an object of class rlg which is basically a list with the following elements:

Author(s)

Javier Crespo Guerrero, Jesús Fernández Iglesias, Luis Angel Garcia Escudero, Agustin Mayo Iscar.

References

García‐Escudero, L. A., Gordaliza, A., San Martin, R., Van Aelst, S., & Zamar, R. (2009). Robust linear clustering. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71, 301-318.

Examples

##--- EXAMPLE 1 ------------------------------------------
data (LG5data)
x <- LG5data[, 1:10]
clus <- rlg(x, d = c(2,2,2), alpha=0.1)
plot(x, col=clus$cluster+1)
plot(clus, which="eigenvalues") 
plot(clus, which="scores") 

##--- EXAMPLE 2 ------------------------------------------
 data (pine) 
 clus <- rlg(pine, d = c(1,1,1), alpha=0.035)
 plot(pine, col=clus$cluster+1)
 

[Package tclust version 2.0-4 Index]