R: Soft clustering of covariance operators.

wassersteinCluster {fdWasserstein}

R Documentation

Soft clustering of covariance operators.

Description

Computes the soft cluster solutions for different values of the number of clusters K.

Usage

wassersteinCluster(data, grp, 
                   kmin = 2, kmax = 10, 
                   E = -0.75 * (0.95 * log(0.95) + 
                        0.05 * log(0.05)) + 0.25 * log(2), 
                   nstart = 5, nrefine = 5, ntry = 0, 
                   max.iter = 20, tol = 0.001, 
                   nreduced = length(unique(grp)), 
                   nperm = 0, 
                   add.sigma = FALSE, 
                   use.future = FALSE, verbose = TRUE)

trimmedAverageSilhouette(a, plot = TRUE)

Arguments

`data`	A N times M matrix containing the N sample curves; M denotes the number of points of the grid on which the curves are available.
`grp`	A vector or factor of length N; a covariance operator is estimated for each level of grp.
`kmin`, `kmax`	A pair of integer defining the desired number of clusters. A solution is computed for K=kmin,...,kmax.
`E`	The desired average entropy.
`nstart`, `nrefine`, `ntry`	The integers used during the initialization search. If ntry=0, then 'ntry' is set to 'round(1+N/K)'.
`max.iter`	Maximum number of block descend iterations.
`tol`	Iterations stop when the relative decrease of the objective function in two consecutive iterations is less than 'tol'.
`nreduced`	The number of covariances used to estimate the cluster barycenters.
`nperm`	The number of permutation used to approximate the reference distribution of max TASW.
`add.sigma`	Should the sample covariances be returned?
`use.future`	Use or not use package 'future' to parallelize the computation? See note.
`verbose`	If 'verbose==TRUE', information on the progress of the optimization are shown.
`a`	A list returned by 'wassersteinCluster'.
`plot`	If 'plot==TRUE', the TASW profile is plotted.

Details

See Masarotto & Masarotto (2023) for the algorithm details.

Value

'wassersteinCluster' returns a list of length kmax-kmin+1. The ith element is a list describing the cluster solution obtained for k=kmin+i-1, and containing:

`K`, `E`, `eta`	the number of cluster, the average entropy and the corresponding value of 'eta';
`w`	the N times K soft partition matrix;
`g`	a M times M times K array with the cluster barycenters;
`d`	a N times K matrices containing the distances between the N sample covariances and the K cluster barycenters;
`obj`	'obj': the minimum value of the objective function.

The list may have the following attributes:

`df`	the degree of freedom of the sample operators (a vector). Always present.
`sample.covariances`	a list contaning the sample operators (as a 3-dimensional array); only present if add.sigma=TRUE;
`tasw.test`	a list containing the value of maxTASW computed from the data (a scalar), the nperm values of of maxTASW obtained by permutation (a vector), and the corresponding p-value (a scalar); only present if nperm>0.

'trimmedAverageSilhouette' returns a numeric vector with the TASW values.

Note

To distribute the computation on more than a cpu

install the package 'future'
execute in the R session
- library(future)
- plan(multissession)

For more options, see the future's documentation

Author(s)

Valentina Masarotto, Guido Masarotto

References

Masarotto, V. & Masarotto, G. (2023) "Covariance-based soft clustering of functional data based on the Wasserstein-Procrustes metric", Scandinavian Journal of Statistics, doi:10.1111/sjos.12692.

Examples


# Example phoneme.R (simplified) from https://doi.org/10.1111/sjos.12692. 
data(phoneme)
# resampling the log-periodograms
# 15 sample covariances for each phoneme
set.seed(12345)
nsubsamples <- 15
n <- 40
gg <- unique(Phoneme)
nphonemes <- length(gg)
N <- n*nsubsamples*nphonemes
M <- NCOL(logPeriodogram)
X <- matrix(NA, N, M)
gr <- integer(N)
r <- 1
first <- 1
last <- n
for (l in gg) {
  for (i in 1:nsubsamples) {
    X[first:last, ] <- logPeriodogram[sample(which(Phoneme==l),n), ]
    gr[first:last] <- r
    r <- r+1
    first <- first+n
    last <- last+n
  }
}
# soft clustering
a <- wassersteinCluster(X, gr)
# how many cluster?
trimmedAverageSilhouette(a)
# the membership weigths show that the
# algorithm reconstructed the five phoneme
w <- ts(a[[4]]$w)
colnames(w) <- paste("Cluster", 1:5)
plot(w, xlab="Sample covariances", main="")

[Package fdWasserstein version 1.0 Index]