R: Clustering and alignment of functional data

kma {briKmeans}

R Documentation

Clustering and alignment of functional data

Description

kma jointly performs clustering and alignment of a functional dataset (multidimensional or unidimensional functions).

Usage

kma(x, y0 = NULL, y1 = NULL, n.clust = 1, warping.method = "affine",
similarity.method = "d1.pearson", center.method = "k-means", seeds = NULL,
optim.method = "L-BFGS-B", span = 0.15, t.max = 0.1, m.max = 0.1, n.out = NULL,
tol = 0.01, fence = TRUE, iter.max = 100, show.iter = 0, nstart=2, return.all=FALSE,
check.total.similarity=FALSE)

Arguments

`x`	matrix n.func X grid.size or vector grid.size: the abscissa values where each function is evaluated. n.func: number of functions in the dataset. grid.size: maximal number of abscissa values where each function is evaluated. The abscissa points may be unevenly spaced and they may differ from function to function. `x` can also be a vector of length grid.size. In this case, `x` will be used as abscissa grid for all functions.
`y0`	matrix n.func X grid.size or array n.func X grid.size X d: evaluations of the set of original functions on the abscissa grid `x`. n.func: number of functions in the dataset. grid.size: maximal number of abscissa values where each function is evaluated. d: (only if the sample is multidimensional) number of function components, i.e. each function is a d-dimensional curve. Default value of `y0` is `NULL`. The parameter `y0` must be provided if the chosen `similarity.method` concerns original functions.
`y1`	matrix n.func X grid.size or array n.func X grid.size X d: evaluations of the set of original functions first derivatives on the abscissa grid `x`. Default value of `y1` is `NULL`. The parameter `y1` must be provided if the chosen `similarity.method` concerns original function first derivatives.
`n.clust`	scalar: required number of clusters. Default value is `1`. Note that if `n.clust=1` kma performs only alignment without clustering.
`warping.method`	character: type of alignment required. If `warping.method='NOalignment'` kma performs only k-mean clustering (without alignment). If `warping.method='affine'` kma performs alignment (and possibly clustering) of functions using linear affine transformation as warping functions, i.e., x.final = dilationx + shift. If `warping.method='shift'` kma allows only shift, i.e., x.final = x + shift. If `warping.method='dilation'` kma allows only dilation, i.e., x.final = dilationx. Default value is `'affine'`.
`similarity.method`	character: required similarity measure. Possible choices are: `'d0.pearson'`, `'d1.pearson'`, `'d0.L2'`, `'d1.L2'`, `'d0.L2.centered'`, `'d1.L2.centered'`. Default value is `'d1.pearson'`. See kma.similarity for details.
`center.method`	character: type of clustering method to be used. Possible choices are: `'k-means'` and `'k-medoids'`. Default value is `'k-means'`.
`seeds`	vector max(n.clust) or matrix nstart X n.clust: indexes of the functions to be used as initial centers. If it is a matrix, each row contains the indexes of the initial centers of one of the `nstart` initializations. In the case where not all the values of `seeds` are provided, those not provided are randomly chosen among the `n.func` original functions. If `seeds=NULL` all the centers are randomly chosen. Default value of `seeds` is `NULL`

`optim.method`	character: optimization method chosen to find the best warping functions at each iteration. Possible choices are: `'L-BFGS-B'` and `'SANN'`. See optim function for details. Default method is `'L-BFGS-B'`.
`span`	scalar: the span to be used for the loess procedure in the center estimation step when `center.method='k-means'`. Default value is 0.15. If `center.method='k-medoids'` value of `span` is ignored.
`t.max`	scalar: `t.max` controls the maximal allowed shift, at each iteration, in the alignment procedure with respect to the range of curve domains. `t.max` must be such that 0<t.max<1 (e.g., `t.max=0.1` means that shift is bounded, at each iteration, between -0.1range(x)* and +0.1range(x)*). Default value is `0.1`. If `warping.method='dilation'` value of `t.max` is ignored.
`m.max`	scalar: `m.max` controls the maximal allowed dilation, at each iteration, in the alignment procedure. `m.max` must be such that 0<m.max<1 (e.g., `m.max=0.1` means that dilation is bounded, at each iteration, between 1-0.1 and 1+0.1 ). Default value is `0.1`. If `warping.method='shift'` value of `m.max` is ignored.
`n.out`	scalar: the desired length of the abscissa for computation of the similarity indexes and the centers. Default value is `round(1.1*grid.size)`.
`tol`	scalar: the algorithm stops when the increment of similarity of each function with respect to the corrispondent center is lower than `tol`. Default value is `0.01`.
`fence`	boolean: if `fence=TRUE` a control is activated at the end of each iteration. The aim of the control is to avoid shift/dilation outlighers with respect to their computed distributions. If `fence=TRUE` the running time can increase considerably. Default value of `fence` is TRUE.
`iter.max`	scalar: maximum number of iterations in the k-mean alignment cycle. Default value is `100`.
`show.iter`	boolean: if `show.iter=TRUE` kma shows the current iteration of the algorithm. Default value is `FALSE`.
`nstart`	scalar: number of initializations with different seeds. Default value is `2`. This parameter is used only if `center.method` is `'k-medoids'`. When `center.method = 'k-means'` one initialization is performed.
`return.all`	boolean: if `return.all=TRUE` the results of all the `nstart` initializations are returned; the output is a list of length `nstart`. If `return.all=FALSE` only the best result is provided (the one with higher mean similarity if `similarity.method` is `'d0.pearson'` or`'d1.pearson'`, or the one with lower distance if `similarity.method` is `'d0.L2'`, `'d1.L2'`, `'d0.L2.centered'` or `'d1.L2.centered'`). Default value is `FALSE`.
`check.total.similarity`	boolean: if `check.total.similarity=TRUE` at each iteration the algorithm checks if there is a decrease of the total similarity and stops. In the affermative case the result obtained in the penultimate iteration is returned. Default value is `FALSE`

Value

The function output is a list containing the following elements:

`iterations`	scalar: total number of iterations performed by kma function.
`x`	as input.
`y0`	as input.
`y1`	as input.
`n.clust`	as input.
`warping.method`	as input.
`similarity.method`	as input.
`center.method`	as input.
`x.center.orig`	vector n.out: abscissa of the original center.
`y0.center.orig`	matrix 1 X n.out: the unique row contains the evaluations of the original function center. If `warping.method='k-means'` there are two scenarios: if `similarity.method='d0.pearson'` or `'d0.L2'` or `d0.L2.centered` the original function center is computed via loess procedure applied to original data; if `similarity.method='d1.pearson'` or `'d1.L2'` or `d1.L2.centered` it is computed by integration of first derivatives center `y1.center.orig` (the integration constant is computed minimizing the sum of the weighed L2 distances between the center and the original functions). If `warping.method='k-medoids'` the original function center is the medoid of original functions.
`y1.center.orig`	matrix 1 X n.out: the unique row contains the evaluations of the original function first derivatives center. If `warping.method='k-means'` the original center is computed via loess procedure applied to original function first derivatives. If `warping.method='k-medoids'` the original center is the medoid of original functions.
`similarity.orig`	vector: original similarities between the original functions and the original center.
`x.final`	matrix n.func X grid.size: aligned abscissas.
`n.clust.final`	scalar: final number of clusters. Note that, when `center.method='k.means'`, the parameter `n.clust.final` may differ from initial number of clusters (i.e., from `n.clust`) if some clusters are found to be empty. In this case a warning message is issued.
`x.centers.final`	vector n.out: abscissas of the final function centers and/or of the final function first derivatives centers.
`y0.centers.final`	matrix n.clust.final X n.out: rows contain the evaluations of the final functions centers. `y0.centers.final` is `NULL` if `y0` is not given as input.
`y1.centers.final`	matrix n.clust.final X n.out: rows contains the evaluations of the final derivatives centers. `y1.centers.final` is `NULL` if the chosen similarity measure does not concern function first derivatives.
`labels`	vector: cluster assignments.
`similarity.final`	vector: similarities between each function and the center of the cluster the function is assigned to.
`dilation.list`	list: dilations obtained at each iteration of kma function.
`shift.list`	list: shifts obtained at each iteration of kma function.
`dilation`	vector: dilation applied to the original abscissas `x` to obtain the aligned abscissas `x.final`.
`shift`	vector: shift applied to the original abscissas `x` to obtain the aligned abscissas `x.final`.

Author(s)

Alice Parodi, Mirco Patriarca, Laura Sangalli, Piercesare Secchi, Simone Vantini, Valeria Vitelli.

References

Sangalli, L.M., Secchi, P., Vantini, S., Vitelli, V., 2010. "K-mean alignment for curve clustering". Computational Statistics and Data Analysis, 54, 1219-1233.

Sangalli, L.M., Secchi, P., Vantini, S., 2014. "Analysis of AneuRisk65 data: K-mean Alignment". Electronic Journal of Statistics, Special Section on "Statistics of Time Warpings and Phase Variations", Vol. 8, No. 2, 1891-1904.

Examples

    ## simulated data
    set.seed(1)
    x.coord = seq(0,1,0.01)
    x <- matrix(ncol = length(x.coord), nrow = 100)
    labels <- matrix(ncol = 100, nrow = 1)
  
    centers <-  matrix(ncol = length(x.coord), nrow = 4)
    centers[1, ] <- abs(x.coord)-0.5
    centers[2, ] <- (abs(x.coord-0.5))^2 - 0.8
    centers[3, ] <- -(abs(x.coord-0.5))^2 + 0.7
    centers[4, ] <- 0.75*sin(8*pi*abs(x.coord))
  
    for(i in 1:4){
        for(j in 1:25){
            labels[25*(i-1) + j] <- i  
            if(i == 1){x[25*(i-1) + j, ] <- abs(x.coord)-0.5 + 
                rnorm(length(x.coord),0,0.1)}
            if(i == 2){x[25*(i-1) + j, ] <- (abs(x.coord-0.5))^2 - 0.8 + 
                rnorm(length(x.coord),0,0.1)}
            if(i == 3){x[25*(i-1) + j, ] <- -(abs(x.coord-0.5))^2 + 0.7 + 
                rnorm(length(x.coord),0,0.1)}
            if(i == 4){x[25*(i-1) + j, ] <- 0.75*sin(8*pi*abs(x.coord)) + 
                rnorm(length(x.coord),0,0.1)}
            }
        }
    C <- kma(x.coord, x, n.clust = 4, 
            warping.method = "NOalignment", similarity.method = "d0.pearson")
    table(C$labels, labels)

[Package briKmeans version 1.0 Index]