kma {briKmeans}R Documentation

Clustering and alignment of functional data

Description

kma jointly performs clustering and alignment of a functional dataset (multidimensional or unidimensional functions).

Usage

kma(x, y0 = NULL, y1 = NULL, n.clust = 1, warping.method = "affine",
similarity.method = "d1.pearson", center.method = "k-means", seeds = NULL,
optim.method = "L-BFGS-B", span = 0.15, t.max = 0.1, m.max = 0.1, n.out = NULL,
tol = 0.01, fence = TRUE, iter.max = 100, show.iter = 0, nstart=2, return.all=FALSE,
check.total.similarity=FALSE)

Arguments

x

matrix n.func X grid.size or vector grid.size: the abscissa values where each function is evaluated. n.func: number of functions in the dataset. grid.size: maximal number of abscissa values where each function is evaluated. The abscissa points may be unevenly spaced and they may differ from function to function. x can also be a vector of length grid.size. In this case, x will be used as abscissa grid for all functions.

y0

matrix n.func X grid.size or array n.func X grid.size X d: evaluations of the set of original functions on the abscissa grid x. n.func: number of functions in the dataset. grid.size: maximal number of abscissa values where each function is evaluated. d: (only if the sample is multidimensional) number of function components, i.e. each function is a d-dimensional curve. Default value of y0 is NULL. The parameter y0 must be provided if the chosen similarity.method concerns original functions.

y1

matrix n.func X grid.size or array n.func X grid.size X d: evaluations of the set of original functions first derivatives on the abscissa grid x. Default value of y1 is NULL. The parameter y1 must be provided if the chosen similarity.method concerns original function first derivatives.

n.clust

scalar: required number of clusters. Default value is 1. Note that if n.clust=1 kma performs only alignment without clustering.

warping.method

character: type of alignment required. If warping.method='NOalignment' kma performs only k-mean clustering (without alignment). If warping.method='affine' kma performs alignment (and possibly clustering) of functions using linear affine transformation as warping functions, i.e., x.final = dilation*x + shift. If warping.method='shift' kma allows only shift, i.e., x.final = x + shift. If warping.method='dilation' kma allows only dilation, i.e., x.final = dilation*x. Default value is 'affine'.

similarity.method

character: required similarity measure. Possible choices are: 'd0.pearson', 'd1.pearson', 'd0.L2', 'd1.L2', 'd0.L2.centered', 'd1.L2.centered'. Default value is 'd1.pearson'. See kma.similarity for details.

center.method

character: type of clustering method to be used. Possible choices are: 'k-means' and 'k-medoids'. Default value is 'k-means'.

seeds

vector max(n.clust) or matrix nstart X n.clust: indexes of the functions to be used as initial centers. If it is a matrix, each row contains the indexes of the initial centers of one of the nstart initializations. In the case where not all the values of seeds are provided, those not provided are randomly chosen among the n.func original functions. If seeds=NULL all the centers are randomly chosen. Default value of seeds is NULL

.

optim.method

character: optimization method chosen to find the best warping functions at each iteration. Possible choices are: 'L-BFGS-B' and 'SANN'. See optim function for details. Default method is 'L-BFGS-B'.

span

scalar: the span to be used for the loess procedure in the center estimation step when center.method='k-means'. Default value is 0.15. If center.method='k-medoids' value of span is ignored.

t.max

scalar: t.max controls the maximal allowed shift, at each iteration, in the alignment procedure with respect to the range of curve domains. t.max must be such that 0<t.max<1 (e.g., t.max=0.1 means that shift is bounded, at each iteration, between -0.1*range(x) and +0.1*range(x)). Default value is 0.1. If warping.method='dilation' value of t.max is ignored.

m.max

scalar: m.max controls the maximal allowed dilation, at each iteration, in the alignment procedure. m.max must be such that 0<m.max<1 (e.g., m.max=0.1 means that dilation is bounded, at each iteration, between 1-0.1 and 1+0.1 ). Default value is 0.1. If warping.method='shift' value of m.max is ignored.

n.out

scalar: the desired length of the abscissa for computation of the similarity indexes and the centers. Default value is round(1.1*grid.size).

tol

scalar: the algorithm stops when the increment of similarity of each function with respect to the corrispondent center is lower than tol. Default value is 0.01.

fence

boolean: if fence=TRUE a control is activated at the end of each iteration. The aim of the control is to avoid shift/dilation outlighers with respect to their computed distributions. If fence=TRUE the running time can increase considerably. Default value of fence is TRUE.

iter.max

scalar: maximum number of iterations in the k-mean alignment cycle. Default value is 100.

show.iter

boolean: if show.iter=TRUE kma shows the current iteration of the algorithm. Default value is FALSE.

nstart

scalar: number of initializations with different seeds. Default value is 2. This parameter is used only if center.method is 'k-medoids'. When center.method = 'k-means' one initialization is performed.

return.all

boolean: if return.all=TRUE the results of all the nstart initializations are returned; the output is a list of length nstart. If return.all=FALSE only the best result is provided (the one with higher mean similarity if similarity.method is 'd0.pearson' or'd1.pearson', or the one with lower distance if similarity.method is 'd0.L2', 'd1.L2', 'd0.L2.centered' or 'd1.L2.centered'). Default value is FALSE.

check.total.similarity

boolean: if check.total.similarity=TRUE at each iteration the algorithm checks if there is a decrease of the total similarity and stops. In the affermative case the result obtained in the penultimate iteration is returned. Default value is FALSE

Value

The function output is a list containing the following elements:

iterations

scalar: total number of iterations performed by kma function.

x

as input.

y0

as input.

y1

as input.

n.clust

as input.

warping.method

as input.

similarity.method

as input.

center.method

as input.

x.center.orig

vector n.out: abscissa of the original center.

y0.center.orig

matrix 1 X n.out: the unique row contains the evaluations of the original function center. If warping.method='k-means' there are two scenarios: if similarity.method='d0.pearson' or 'd0.L2' or d0.L2.centered the original function center is computed via loess procedure applied to original data; if similarity.method='d1.pearson' or 'd1.L2' or d1.L2.centered it is computed by integration of first derivatives center y1.center.orig (the integration constant is computed minimizing the sum of the weighed L2 distances between the center and the original functions). If warping.method='k-medoids' the original function center is the medoid of original functions.

y1.center.orig

matrix 1 X n.out: the unique row contains the evaluations of the original function first derivatives center. If warping.method='k-means' the original center is computed via loess procedure applied to original function first derivatives. If warping.method='k-medoids' the original center is the medoid of original functions.

similarity.orig

vector: original similarities between the original functions and the original center.

x.final

matrix n.func X grid.size: aligned abscissas.

n.clust.final

scalar: final number of clusters. Note that, when center.method='k.means', the parameter n.clust.final may differ from initial number of clusters (i.e., from n.clust) if some clusters are found to be empty. In this case a warning message is issued.

x.centers.final

vector n.out: abscissas of the final function centers and/or of the final function first derivatives centers.

y0.centers.final

matrix n.clust.final X n.out: rows contain the evaluations of the final functions centers. y0.centers.final is NULL if y0 is not given as input.

y1.centers.final

matrix n.clust.final X n.out: rows contains the evaluations of the final derivatives centers. y1.centers.final is NULL if the chosen similarity measure does not concern function first derivatives.

labels

vector: cluster assignments.

similarity.final

vector: similarities between each function and the center of the cluster the function is assigned to.

dilation.list

list: dilations obtained at each iteration of kma function.

shift.list

list: shifts obtained at each iteration of kma function.

dilation

vector: dilation applied to the original abscissas x to obtain the aligned abscissas x.final.

shift

vector: shift applied to the original abscissas x to obtain the aligned abscissas x.final.

Author(s)

Alice Parodi, Mirco Patriarca, Laura Sangalli, Piercesare Secchi, Simone Vantini, Valeria Vitelli.

References

Sangalli, L.M., Secchi, P., Vantini, S., Vitelli, V., 2010. "K-mean alignment for curve clustering". Computational Statistics and Data Analysis, 54, 1219-1233.

Sangalli, L.M., Secchi, P., Vantini, S., 2014. "Analysis of AneuRisk65 data: K-mean Alignment". Electronic Journal of Statistics, Special Section on "Statistics of Time Warpings and Phase Variations", Vol. 8, No. 2, 1891-1904.

See Also

kma.similarity

Examples

    ## simulated data
    set.seed(1)
    x.coord = seq(0,1,0.01)
    x <- matrix(ncol = length(x.coord), nrow = 100)
    labels <- matrix(ncol = 100, nrow = 1)
  
    centers <-  matrix(ncol = length(x.coord), nrow = 4)
    centers[1, ] <- abs(x.coord)-0.5
    centers[2, ] <- (abs(x.coord-0.5))^2 - 0.8
    centers[3, ] <- -(abs(x.coord-0.5))^2 + 0.7
    centers[4, ] <- 0.75*sin(8*pi*abs(x.coord))
  
    for(i in 1:4){
        for(j in 1:25){
            labels[25*(i-1) + j] <- i  
            if(i == 1){x[25*(i-1) + j, ] <- abs(x.coord)-0.5 + 
                rnorm(length(x.coord),0,0.1)}
            if(i == 2){x[25*(i-1) + j, ] <- (abs(x.coord-0.5))^2 - 0.8 + 
                rnorm(length(x.coord),0,0.1)}
            if(i == 3){x[25*(i-1) + j, ] <- -(abs(x.coord-0.5))^2 + 0.7 + 
                rnorm(length(x.coord),0,0.1)}
            if(i == 4){x[25*(i-1) + j, ] <- 0.75*sin(8*pi*abs(x.coord)) + 
                rnorm(length(x.coord),0,0.1)}
            }
        }
    C <- kma(x.coord, x, n.clust = 4, 
            warping.method = "NOalignment", similarity.method = "d0.pearson")
    table(C$labels, labels)


[Package briKmeans version 1.0 Index]