R: Intrinsic wavelet HPD spectral matrix clustering

pdSpecClust1D {pdSpecEst}

R Documentation

Intrinsic wavelet HPD spectral matrix clustering

Description

pdSpecClust1D performs clustering of HPD spectral matrices corrupted by noise (e.g. HPD periodograms) by combining wavelet thresholding and fuzzy clustering in the intrinsic wavelet coefficient domain according to the following steps:

Transform a collection of noisy HPD spectral matrices to the intrinsic wavelet domain and denoise the HPD matrix curves by (tree-structured) thresholding of wavelet coefficients with pdSpecEst1D.
Apply an intrinsic fuzzy c-means algorithm to the coarsest midpoints at scale j = 0 across subjects.
Taking into account the fuzzy cluster assignments in the previous step, apply a weighted fuzzy c-means algorithm to the nonzero thresholded wavelet coefficients across subjects from scale j = 1 up to j = jmax.

More details can be found in Chapter 3 of (Chau 2018) and the accompanying vignettes.

Usage

pdSpecClust1D(P, K, jmax, metric = "Riemannian", m = 2, d.jmax = 0.1,
  eps = c(1e-04, 1e-04), tau = 0.5, max_iter = 50,
  return.centers = FALSE, ...)

Arguments

`P`	a (`d,d,n,S`)-dimensional array of HPD matrices, corresponding to a collection of sequences of `(d,d)`-dimensional HPD matrices of length `n`, with `n = 2^J` for some `J > 0`, for `S` different subjects.
`K`	the number of clusters, a positive integer larger than 1.
`jmax`	an upper bound on the maximum wavelet scale to be considered in the clustering procedure. If `jmax` is not specified, it is set equal to the maximum (i.e., finest) wavelet scale minus 2.
`metric`	the metric that the space of HPD matrices is equipped with. The default choice is `"Riemannian"`, but this can also be one of: `"logEuclidean"`, `"Cholesky"`, `"rootEuclidean"` or `"Euclidean"`. Additional details are given below.
`m`	the fuzziness parameter for both fuzzy c-means algorithms. `m` should be larger or equal to `1`. If `m = 1` the cluster assignments are no longer fuzzy, i.e., the procedure performs hard clustering.
`d.jmax`	a proportion that is used to determine the maximum wavelet scale to be considered in the clustering procedure. A larger value `d.jmax` leads to less wavelet coefficients being taken into account, and therefore lower computational effort in the procedure. If `d.jmax` is not specified, by default `d.jmax = 0.1`.
`eps`	an optional vector with two components determining the stopping criterion. The first step in the cluster procedure terminates if the (integrated) intrinsic distance between cluster centers is smaller than `eps[1]`. The second step in the cluster procedure terminates if the (integrated) Euclidean distance between cluster centers is smaller than `eps[2]`. By default `eps = c(1e-04, 1e-04)`.
`tau`	an optional argument tuning the weight given to the cluster assignments obtained in the first step of the clustering algorithm. If `tau` is not specified, by default `tau = 0.5`.
`max_iter`	an optional argument tuning the maximum number of iterations in both the first and second step of the clustering algorithm, defaults to `max_iter = 50`.
`return.centers`	should the cluster centers transformed back the space of HPD matrices also be returned? Defaults to `return.centers = FALSE`.
`...`	additional arguments passed on to `pdSpecEst1D`.

Details

The input array P corresponds to a collection of initial noisy HPD spectral estimates of the (d,d)-dimensional spectral matrix at n different frequencies, with n = 2^J for some J > 0, for S different subjects. These can be e.g., multitaper HPD periodograms given as output by the function pdPgram.
First, for each subject s = 1,\ldots,S, thresholded wavelet coefficients in the intrinsic wavelet domain are calculated by pdSpecEst1D, see the function documentation for additional details on the wavelet thresholding procedure.
The maximum wavelet scale taken into account in the clustering procedure is determined by the arguments jmax and d.jmax. The maximum scale is set to the minimum of jmax and the wavelet scale j for which the proportion of nonzero thresholded wavelet coefficients (averaged across subjects) is smaller than d.jmax.
The S subjects are assigned to K different clusters in a probabilistic fashion according to a two-step procedure:

In the first step, an intrinsic fuzzy c-means algorithm, with fuzziness parameter m is applied to the S coarsest midpoints at scale j = 0 in the subject-specific midpoint pyramids. Note that the distance function in the intrinsic c-means algorithm relies on the chosen metric on the space of HPD matrices.
In the second step, a weighted fuzzy c-means algorithm based on the Euclidean distance function, also with fuzziness parameter m, is applied to the nonzero thresholded wavelet coefficients of the S different subjects. The tuning parameter tau controls the weight given to the cluster assignments obtained in the first step of the clustering algorithm.

The function computes the forward and inverse intrinsic AI wavelet transform in the space of HPD matrices equipped with one of the following metrics: (i) the affine-invariant Riemannian metric (default) as detailed in e.g., (Bhatia 2009)[Chapter 6] or (Pennec et al. 2006); (ii) the log-Euclidean metric, the Euclidean inner product between matrix logarithms; (iii) the Cholesky metric, the Euclidean inner product between Cholesky decompositions; (iv) the Euclidean metric; or (v) the root-Euclidean metric. The default choice of metric (affine-invariant Riemannian) satisfies several useful properties not shared by the other metrics, see (Chau and von Sachs 2019) or (Chau 2018) for more details. Note that this comes at the cost of increased computation time in comparison to one of the other metrics.
If return.centers = TRUE, the function also returns the K HPD spectral matrix curves corresponding to the cluster centers based on the given metric by applying the intrinsic inverse AI wavelet transform ( InvWavTransf1D) to the cluster centers in the wavelet domain.

Value

Depending on the input the function returns a list with five or six components:

cl.prob: an (S,K)-dimensional matrix, where the value at position (s,k) in the matrix corresponds to the probabilistic cluster membership assignment of subject s with respect to cluster k.
cl.centers.D: a list of K wavelet coefficient pyramids, where each pyramid of wavelet coefficients is associated to a cluster center.
cl.centers.M0: a list of K arrays of coarse-scale midpoints at scale j = 0, where each array is associated to a cluster center.
cl.centers.f: only available if return.centers = TRUE, returning a list of K (d,d,n)-dimensional arrays, where each array corresponds to a length n curve of (d,d)-dimensional HPD matrices associated to a cluster center.
cl.jmax: the maximum wavelet scale taken into account in the clustering procedure determined by the input arguments jmax and d.jmax.

References

Bhatia R (2009). Positive Definite Matrices. Princeton University Press, New Jersey.

Chau J (2018). Advances in Spectral Analysis for Multivariate, Nonstationary and Replicated Time Series. phdthesis, Universite catholique de Louvain.

Chau J, von Sachs R (2019). “Intrinsic wavelet regression for curves of Hermitian positive definite matrices.” Journal of the American Statistical Association. doi: 10.1080/01621459.2019.1700129.

Pennec X, Fillard P, Ayache N (2006). “A Riemannian framework for tensor computing.” International Journal of Computer Vision, 66(1), 41–66.

Examples

## ARMA(1,1) process: Example 11.4.1 in (Brockwell and Davis, 1991)
Phi1 <- array(c(0.5, 0, 0, 0.6, rep(0, 4)), dim = c(2, 2, 2))
Phi2 <- array(c(0.7, 0, 0, 0.4, rep(0, 4)), dim = c(2, 2, 2))
Theta <- array(c(0.5, -0.7, 0.6, 0.8, rep(0, 4)), dim = c(2, 2, 2))
Sigma <- matrix(c(1, 0.71, 0.71, 2), nrow = 2)

## Generate periodogram data for 10 subjects in 2 groups
pgram <- function(Phi) pdPgram(rARMA(2^9, 2, Phi, Theta, Sigma)$X)$P
P <- array(c(replicate(5, pgram(Phi1)), replicate(5, pgram(Phi2))), dim=c(2,2,2^8,10))

cl <- pdSpecClust1D(P, K = 2, metric = "logEuclidean")

[Package pdSpecEst version 1.2.4 Index]