R: An implementation of Clest with robust sparse K-means. CER...

Clest {RSKC}

R Documentation

An implementation of Clest with robust sparse K-means. CER is used as a similarity measure.

Description

The function Clest performs Clest ( Dudoit and Fridlyand (2002)) with CER as the measure of the agreement between two partitions (in each training set). The following clustering algorithm can be used: K-means, trimmed K-means, sparse K-means and robust sparse K-means.

Usage

Clest(d, maxK, alpha, B = 15, B0 = 5, nstart = 1000, 

      L1 = 6, beta = 0.1, pca = TRUE, silent=FALSE)

Arguments

`d`	A numerical data matrix (`N` by `p`) where `N` is the number of cases and `p` is the number of features. The cases are clustered.
`maxK`	The maximum number of clusters that you suspect.
`alpha`	See `RSKC`.
`B`	The number of times that an observed dataset `d` is randomly partitioned into a learning set and a training set. Note that each generated reference dataset is partitioned into a learning and a testing set only once to ease the computational cost.
`B0`	The number of times that the reference dataset is generated.
`nstart`	The number of random initial sets of cluster centers at Step(a) of robust sparse K-means clustering.
`L1`	See `RSKC`.
`beta`	0 <= `beta` <= 1: significance level. Clest chooses the number of clusters that returns the strongest significant evidence against the hypothesis H0 : K = 1.
`pca`	Logical, if `TRUE`, then reference datasets are generated from a PCA reference distribution. If `FALSE`, then the reference data set is generated from a simple reference distribution.
`silent`	Logical, if `TRUE`, then the number of iteration on progress is not printed.

Value

`K`	The solution of Clest; the estimated number of clusters.
`result.table`	A real matrix (`maxK-1` by 4). Each row represents `K=2`,...,`maxK` and columns represent the test statistics (=observed CER-reference CER), observed CER, reference CER and P-value.
`referenceCERs`	A matrix (`B0` by `maxK-1`), containing CERs of testing datasets from generated datasets for each `K=2,...,maxK`.
`observedCERs`	A matrix (`B` by `maxK-1`), containing CERs of `B` testing sets for each `K=2,...,maxK`.
`call`	The matched call.

Author(s)

Yumi Kondo <y.kondo@stat.ubc.ca>

References

Yumi Kondo (2011), Robustificaiton of the sparse K-means clustering algorithm, MSc. Thesis, University of British Columbia http://hdl.handle.net/2429/37093

S. Dudoit and J. Fridlyand. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology, 3(7), 2002.

Examples

## Not run: 
# little simulation function 
sim <-
function(mu,f){
   D<-matrix(rnorm(60*f),60,f)
   D[1:20,1:50]<-D[1:20,1:50]+mu
   D[21:40,1:50]<-D[21:40,1:50]-mu  
   return(D)
   }
 
 set.seed(1)
 d<-sim(1.5,100); # non contaminated dataset with noise variables
 
# Clest with robust sparse K-means
rsk<-Clest(d,5,alpha=1/20,B=3,B0=10, beta = 0.05, nstart=100,pca=TRUE,L1=3,silent=TRUE);
# Clest with K-means
k<-Clest(d,5,alpha=0,B=3,B0=10, beta = 0.05, nstart=100,pca=TRUE,L1=NULL,silent=TRUE);

## End(Not run)

[Package RSKC version 2.4.2 Index]