Clest {RSKC}R Documentation

An implementation of Clest with robust sparse K-means. CER is used as a similarity measure.

Description

The function Clest performs Clest ( Dudoit and Fridlyand (2002)) with CER as the measure of the agreement between two partitions (in each training set). The following clustering algorithm can be used: K-means, trimmed K-means, sparse K-means and robust sparse K-means.

Usage

Clest(d, maxK, alpha, B = 15, B0 = 5, nstart = 1000, 

      L1 = 6, beta = 0.1, pca = TRUE, silent=FALSE)

Arguments

d

A numerical data matrix (N by p) where N is the number of cases and p is the number of features. The cases are clustered.

maxK

The maximum number of clusters that you suspect.

alpha

See RSKC.

B

The number of times that an observed dataset d is randomly partitioned into a learning set and a training set. Note that each generated reference dataset is partitioned into a learning and a testing set only once to ease the computational cost.

B0

The number of times that the reference dataset is generated.

nstart

The number of random initial sets of cluster centers at Step(a) of robust sparse K-means clustering.

L1

See RSKC.

beta

0 <= beta <= 1: significance level. Clest chooses the number of clusters that returns the strongest significant evidence against the hypothesis H0 : K = 1.

pca

Logical, if TRUE, then reference datasets are generated from a PCA reference distribution. If FALSE, then the reference data set is generated from a simple reference distribution.

silent

Logical, if TRUE, then the number of iteration on progress is not printed.

Value

K

The solution of Clest; the estimated number of clusters.

result.table

A real matrix (maxK-1 by 4). Each row represents K=2,...,maxK and columns represent the test statistics (=observed CER-reference CER), observed CER, reference CER and P-value.

referenceCERs

A matrix (B0 by maxK-1), containing CERs of testing datasets from generated datasets for each K=2,...,maxK.

observedCERs

A matrix (B by maxK-1), containing CERs of B testing sets for each K=2,...,maxK.

call

The matched call.

Author(s)

Yumi Kondo <y.kondo@stat.ubc.ca>

References

Yumi Kondo (2011), Robustificaiton of the sparse K-means clustering algorithm, MSc. Thesis, University of British Columbia http://hdl.handle.net/2429/37093

S. Dudoit and J. Fridlyand. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology, 3(7), 2002.

Examples

## Not run: 
# little simulation function 
sim <-
function(mu,f){
   D<-matrix(rnorm(60*f),60,f)
   D[1:20,1:50]<-D[1:20,1:50]+mu
   D[21:40,1:50]<-D[21:40,1:50]-mu  
   return(D)
   }
 
 set.seed(1)
 d<-sim(1.5,100); # non contaminated dataset with noise variables
 
# Clest with robust sparse K-means
rsk<-Clest(d,5,alpha=1/20,B=3,B0=10, beta = 0.05, nstart=100,pca=TRUE,L1=3,silent=TRUE);
# Clest with K-means
k<-Clest(d,5,alpha=0,B=3,B0=10, beta = 0.05, nstart=100,pca=TRUE,L1=NULL,silent=TRUE);

## End(Not run)

[Package RSKC version 2.4.2 Index]