R: Functional Principal Components Analysis Clustering

fpcac {clustEff}

R Documentation

Functional Principal Components Analysis Clustering

Description

This function implements the algorithm FPCAC for curves clustering as a variant of a k-means algorithm based on the principal component rotation of data

Usage

fpcac(X, K = 2, fd = NULL, nbasis = 5, norder = 3, nharmonics = 3,
      alpha = 0, niter = 30, Ksteps = 25, conf.level = 0.9, seed, disp = FALSE)

Arguments

`X`	Matrix of ‘curves’ of dimension `n` x `q`.
`K`	the number of clusters.
`fd`	If not NULL it overrides X and must be an object of class fd.
`nbasis`	an integer variable specifying the number of basis functions. The default value is 5.
`norder`	an integer specifying the order of b-splines, which is one higher than their degree. The default value is 3.
`nharmonics`	the number of harmonics or principal components to use. The default value is 3.
`alpha`	trimming size, that is the given proportion of observations to be discarded.
`niter`	the number or random restarting (larger values provide more accurate solutions.
`Ksteps`	the number of k-mean steps (not too many ksteps are needed).
`conf.level`	the confidence level required.
`seed`	the seed used for reproducibility.
`disp`	if TRUE, it is used to print some information across the algorithm.

Details

FPCAC is a functional PCA-based clustering approach that provides a variation of the algorithm for curves clustering proposed by Garcia-Escudero and Gordaliza (2005).

The starting point of the proposed FPCAC is to find a linear approximation of each curve by a finite $p$ dimensional vector of coefficients defined by the FPCA scores.

The number of starting clusters k is obtained on the basis of the scores volume, such that we assign events to the clusters defined by events that have a distance less than a fixed threshold (e.g. 90-th percentile) in the space of PCA scores. Once k is obtained we use a modified version of the trimmed k-means algorithm, that considers the matrix of FPCA scores instead of the coefficients of a linear fitting to B-spline bases.

The trimmed k-means clustering algorithm looks for the k centers C_1, ..., C_k that are solution of the minimization problem:

O_k(\alpha)=\min_Y \min_{C_1, \cdots, C_k} \frac{1}{[n(1-\alpha)]} \sum_{X_i \in Y} \inf_{1\leq j \leq k} || X_i- C_j||^2

We think that the proposed approach has the advantage of an immediate use of PCA for functional data avoiding some objective choices related to spline fitting as in RCC. Simulations and applications suggest also the well behavior of the FPCAC algorithm, both in terms of stable and easily interpretable results.

Value

An object of class “fpcac”, a list containing the following items:

`call`	the matched call.
`obj.function`	The percentiles used in the quantile regression coefficient modeling or objective function `O_k(\alpha)`.
`centers`	The curves matrix.
`radius`	The vector of clusters.
`clusters`	The mean curves matrix of dimension `n` x `k`.
`Xorig`	The atrix of ‘curves’ of dimension `n` x `q`.
`fd`	The object obtained by the call of FPCA of class ‘fd’
`X`	The matrix of ‘curves’ transformed through FPCA of dimension `p` x `nharmonics`.
`X.mean`	The mean curves matrix of dimension `n` x `k`.
`diss.matrix`	The Euclidean distance matrix of the transformed curves.
`oggSilhouette`	An object of class ‘silhouette’.

Author(s)

Gianluca Sottile gianluca.sottile@unipa.it

References

Adelfio, G., Chiodi, M., D'Alessandro, A. and Luzio, D. (2011) FPCA algorithm for waveform clustering. Journal of Communication and Computer, 8(6), 494-502.

Adelfio, G., Chiodi, M., D'Alessandro, A., Luzio, D., D'Anna, G., Mangano, G. (2012) Simultaneous seismic wave clustering and registration. Computers & Geosciences 44, 60-69.

Garcia-Escudero, L. A. and Gordaliza, A. (2005). A proposal for robust curve clustering, Journal of classification, 22, 185-201.

Examples

set.seed(1234)
n <- 300
x <- 1:n/n

Y <- matrix(0, n, 30)

sigma2 <- 4*pmax(x-.2, 0) - 8*pmax(x-.5, 0) + 4*pmax(x-.8, 0)

mu <- sin(3*pi*x)
for(i in 1:10) Y[, i] <- mu + rnorm(length(x), 0, pmax(sigma2, 0))

mu <- cos(3*pi*x)
for(i in 11:23) Y[,i] <- mu + rnorm(length(x), 0, pmax(sigma2,0))

mu <- sin(3*pi*x)*cos(pi*x)
for(i in 24:28) Y[, i] <- mu + rnorm(length(x), 0, pmax(sigma2, 0))

mu <- 0 #sin(1/3*pi*x)*cos(2*pi*x)
for(i in 29:30) Y[, i] <- mu + rnorm(length(x), 0, pmax(sigma2, 0))

obj <- fpcac(Y, K = 4, disp = FALSE)
obj

[Package clustEff version 0.3.1 Index]