brik {briKmeans}R Documentation

Computation of Initial Seeds and Kmeans Results

Description

brik computes appropriate seeds –based on bootstrap and the MBD depth– to initialise k-means, which is then run.

Usage

brik(x, k, method="Ward", nstart=1, B=10, J = 2, ...)

Arguments

x

a data matrix containing N observations (individuals) by rows and d variables (features) by columns

k

number of clusters

method

clustering algorithm used to cluster the cluster centres from the bootstrapped replicates; Ward, by default. Currently, only pam and randomly initialised kmeans are implemented

nstart

number of random initialisations when using the kmeans method to cluster the cluster centres

B

number of bootstrap replicates to be generated

J

number of observations used to build the bands for the MBD computation. Currently, only the value J=2 can be used

...

additional arguments to be passed to the kmeans function for the final clustering; at this stage nstart is set to 1, as the initial seeds are fixed

Details

The brik algorithm is a simple, computationally feasible method, which provides k-means with a set of initial seeds to cluster datasets of arbitrary dimensions. It consists of two stages: first, a set of cluster centers is obtained by applying k-means to bootstrap replications of the original data to be, next, clustered; the deepest point in each assembled cluster is returned as initial seeds for k-means.

Value

seeds

a matrix of size k x d containing the initial seeds obtained with the BRIk algorithm

km

an object of class kmeans corresponding to the run of kmeans on x with starting points seeds

Author(s)

Javier Albert Smet javas@kth.se and Aurora Torrente etorrent@est-econ.uc3m.es

References

Torrente, A. and Romo, J. (2020). Initializing k-means Clustering by Bootstrap and Data Depth. J Classif (2020). https://doi.org/10.1007/s00357-020-09372-3.

Examples

## brik algorithm 
    ## simulated data
    set.seed(0)
    g1 <- matrix(rnorm(200,0,3), 25, 8) ; g1[,1]<-g1[,1]+4;
    g2 <- matrix(rnorm(200,0,3), 25, 8) ; g2[,1]<-g2[,1]+4; g2[,3]<-g2[,3]-4
    g3 <- matrix(rnorm(200,0,3), 25, 8) ; g3[,1]<-g3[,1]+4; g3[,3]<-g3[,3]+4

    x <- rbind(g1,g2,g3)
    labels <-c(rep(1,25),rep(2,25),rep(3,25))

    C1 <- kmeans(x,3)
    C2 <- brik(x,3,B=25)
  
    table(C1$cluster, labels)
    table(C2$km$cluster, labels)    


[Package briKmeans version 1.0 Index]