opt.fpcac {clustEff} R Documentation

## Optimal cluster selection in Functional Principal Components Analysis Clustering

### Description

This function provides the optimal selection of clusters for the algorithm FPCAC, as a variant of a k-means algorithm based on the principal component rotation of data

### Usage

opt.fpcac(X, k.max = 5, method = c("silhouette", "wss"),
fd = NULL, nbasis = 5, norder = 3, nharmonics = 3,
alpha = 0, niter = 30, Ksteps = 10, seed,
diss = NULL, trace=FALSE)


### Arguments

 X Matrix of ‘curves’ of dimension n x q. k.max the number of cluster used in the optimization step to select the optimal one. method the method used to select the optimal number of clusters, "silhouette" or "wss" (whithin sum of squares. fd If not NULL it overrides X and must be an object of class fd. nbasis an integer variable specifying the number of basis functions. The default value is 5. norder an integer specifying the order of b-splines, which is one higher than their degree. The default value is 3. nharmonics the number of harmonics or principal components to use. The default value is 3. alpha trimming size, that is the given proportion of observations to be discarded. niter the number or random restarting (larger values provide more accurate solutions. Ksteps the number of k-mean steps (not too many ksteps are needed). seed the seed used for reproducibility. diss the dissimilarity matrix used to compute measures "silhouette" or "wss". trace if TRUE, it is used to print some information across the algorithm.

### Details

Silhouette is a method for validate the consistency within clusters, providing a measure of how similar an object is to its own cluster compared to other clusters. The silhouette score S belongs to the interval [-1,1]. S close to one means that the data is appropriately clustered. If S is close to negative one, datum should be clustered in its neighbouring cluster. S near zero means that the datum is on the border of two natural clusters.

The wss is obtained as the classical sum of the squared deviations from each observation and the cluster centroid, providing a measure of the variability of the observations within each cluster. Clusters with higher values exhibit greater variability of the observations within the cluster.

### Value

a list containing the following items:

 obj.function the sequence of objective functions. clusters the matrix in which each columns identify clusters for each fixed K. K the sequence of K used. K.opt the optimal number of clusters plot a ggplot object to plot the curve of silhouette or whithin sum of squares.

### Author(s)

Gianluca Sottile gianluca.sottile@unipa.it

### References

Peter J. Rousseeuw (1987). Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis. Computational and Applied Mathematics. 20, 53-65

K. V. Mardia, J. T. Kent and J. M. Bibby (1979). Multivariate Analysis. Academic Press.

fpcac.

### Examples


set.seed(1234)
n <- 300
x <- 1:n/n

Y <- matrix(0, n, 30)

sigma2 <- 4*pmax(x-.2, 0) - 8*pmax(x-.5, 0) + 4*pmax(x-.8, 0)

mu <- sin(3*pi*x)
for(i in 1:10) Y[, i] <- mu + rnorm(length(x), 0, pmax(sigma2, 0))

mu <- cos(3*pi*x)
for(i in 11:23) Y[,i] <- mu + rnorm(length(x), 0, pmax(sigma2,0))

mu <- sin(3*pi*x)*cos(pi*x)
for(i in 24:28) Y[, i] <- mu + rnorm(length(x), 0, pmax(sigma2, 0))

mu <- 0 #sin(1/3*pi*x)*cos(2*pi*x)
for(i in 29:30) Y[, i] <- mu + rnorm(length(x), 0, pmax(sigma2, 0))

num.clust <- opt.fpcac(Y)
obj2 <- fpcac(Y, K = num.clust\$K.opt, disp = FALSE)
obj2



[Package clustEff version 0.2.0 Index]