elbowRule {briKmeans}R Documentation

Selection of Appropriate DF Parameter Based on an Elbow Rule for the Distortion

Description

elbowRule runs the FABRIk algorithm for different degrees of freedom (DF) and suggests the best of such values as the one where the minimum distortion is obtained. An optional visualization of the computed values allows the choice of alternative suitable DF values based on an elbow-like rule.

Usage

elbowRule(x, k, method="Ward", nstart=1, B = 10, J = 2, x.coord = NULL, OSF = 1, 
    vect = NULL, intercept = TRUE, degPolyn = 3, degFr = 4:20, knots = NULL, 
    plot = FALSE, ...)

Arguments

x

a data matrix containing N observations (individuals) by rows and d variables (features) by columns

k

number of clusters

method

clustering algorithm used to cluster the cluster centres from the bootstrapped replicates; Ward, by default. Currently, only pam and randomly initialised kmeans are implemented

nstart

number of random initialisations when using the kmeans method to cluster the cluster centres

B

number of bootstrap replicates to be generated

J

number of observations used to build the bands for the MBD computation. Currently, only the value J=2 can be used

x.coord

initial x coordinates (time points) where the functional data is observed; if not provided, it is assumed to be 1:d

OSF

oversampling factor for the smoothed data; an OSF of m means that the number of (equally spaced) time points observed in the approximated function is m times the number of original number of features, d

vect

optional collection of x coordinates (time points) where to assess the smoothed data; if provided, it ignores the OSF

intercept

if TRUE, an intercept is included in the basis; default is FALSE

degPolyn

degree of the piecewise polynomial; 3 by default (cubic splines)

degFr

a vector containing tentative values of the degrees of freedom, to be tested

knots

the internal breakpoints that define the spline

plot

a Boolean parameter; it allows plotting the distortion against the degrees of freedom. Set to FALSE by default

...

additional arguments to be passed to the kmeans function for the final clustering; at this stage nstart is set to 1, as the initial seeds are fixed

Details

The function implements a simple elbow-like rule that allows selecting an appropriate value for the DF parameter among the tested ones. It computes the distortion obtained for each of these values and returns the one yielding to the smallest distortion. By setting the parameter plot to TRUE the distortion is plotted against the degrees of freedom and elbows or minima can be visually detected.

Value

df

the original vector of DF values to be tested

tot.withinss

a vector containing the distortion obtained for each tested DF value

optimal

DF value producing the smallest distortion among the tested df

Author(s)

Javier Albert Smet javas@kth.se and Aurora Torrente etorrent@est-econ.uc3m.es

References

Torrente, A. and Romo, J. (2020). Initializing Kmeans Clustering by Bootstrap and Data Depth. J Classif (2020). https://doi.org/10.1007/s00357-020-09372-3. Albert-Smet, J., Torrente, A. and Romo J. (2021). Modified Band Depth Based Initialization of Kmeans for Functional Data Clustering. Submitted to Computational Statistics and Data Analysis.

Examples

    ## simulated data
    set.seed(1)
    x.coord = seq(0,1,0.01)
    x <- matrix(ncol = length(x.coord), nrow = 80)
    labels <- matrix(ncol = 100, nrow = 1)
  
    centers <-  matrix(ncol = length(x.coord), nrow = 4)
    centers[1, ] <- abs(x.coord)-0.5
    centers[2, ] <- (abs(x.coord-0.5))^2 - 0.8
    centers[3, ] <- -(abs(x.coord-0.5))^2 + 0.7
    centers[4, ] <- 0.75*sin(8*pi*abs(x.coord))
  
    for(i in 1:4){
        for(j in 1:20){
            labels[20*(i-1) + j] <- i  
            if(i == 1){x[20*(i-1) + j, ] <- abs(x.coord)-0.5 + 
                rnorm(length(x.coord),0,1.5)}
            if(i == 2){x[20*(i-1) + j, ] <- (abs(x.coord-0.5))^2 - 0.8 + 
                rnorm(length(x.coord),0,1.5)}
            if(i == 3){x[20*(i-1) + j, ] <- -(abs(x.coord-0.5))^2 + 0.7 + 
                rnorm(length(x.coord),0,1.5)}
            if(i == 4){x[20*(i-1) + j, ] <- 0.75*sin(8*pi*abs(x.coord)) + 
                rnorm(length(x.coord),0,1.5)}
            }
        }

    # ER <- elbowRule(x, 4, B=25, degFr = 5:12, plot=FALSE)
    ER <- elbowRule(x, 4, B=25, degFr = 5:12, plot=TRUE)
  

[Package briKmeans version 1.0 Index]