R: Computation of Initial Seeds and Kmeans Results

brik {briKmeans}

R Documentation

Computation of Initial Seeds and Kmeans Results

Description

brik computes appropriate seeds –based on bootstrap and the MBD depth– to initialise k-means, which is then run.

Usage

brik(x, k, method="Ward", nstart=1, B=10, J = 2, ...)

Arguments

`x`	a data matrix containing `N` observations (individuals) by rows and `d` variables (features) by columns
`k`	number of clusters
`method`	clustering algorithm used to cluster the cluster centres from the bootstrapped replicates; `Ward`, by default. Currently, only `pam` and randomly initialised `kmeans` are implemented
`nstart`	number of random initialisations when using the `kmeans` method to cluster the cluster centres
`B`	number of bootstrap replicates to be generated
`J`	number of observations used to build the bands for the MBD computation. Currently, only the value J=2 can be used
`...`	additional arguments to be passed to the `kmeans` function for the final clustering; at this stage `nstart` is set to 1, as the initial seeds are fixed

Details

The brik algorithm is a simple, computationally feasible method, which provides k-means with a set of initial seeds to cluster datasets of arbitrary dimensions. It consists of two stages: first, a set of cluster centers is obtained by applying k-means to bootstrap replications of the original data to be, next, clustered; the deepest point in each assembled cluster is returned as initial seeds for k-means.

Value

`seeds`	a matrix of size `k x d` containing the initial seeds obtained with the BRIk algorithm
`km`	an object of class `kmeans` corresponding to the run of kmeans on `x` with starting points `seeds`

Author(s)

Javier Albert Smet javas@kth.se and Aurora Torrente etorrent@est-econ.uc3m.es

References

Torrente, A. and Romo, J. (2020). Initializing k-means Clustering by Bootstrap and Data Depth. J Classif (2020). https://doi.org/10.1007/s00357-020-09372-3.

Examples

## brik algorithm 
    ## simulated data
    set.seed(0)
    g1 <- matrix(rnorm(200,0,3), 25, 8) ; g1[,1]<-g1[,1]+4;
    g2 <- matrix(rnorm(200,0,3), 25, 8) ; g2[,1]<-g2[,1]+4; g2[,3]<-g2[,3]-4
    g3 <- matrix(rnorm(200,0,3), 25, 8) ; g3[,1]<-g3[,1]+4; g3[,3]<-g3[,3]+4

    x <- rbind(g1,g2,g3)
    labels <-c(rep(1,25),rep(2,25),rep(3,25))

    C1 <- kmeans(x,3)
    C2 <- brik(x,3,B=25)
  
    table(C1$cluster, labels)
    table(C2$km$cluster, labels)

[Package briKmeans version 1.0 Index]