pdSpecClust1D {pdSpecEst} | R Documentation |
Intrinsic wavelet HPD spectral matrix clustering
Description
pdSpecClust1D
performs clustering of HPD spectral matrices corrupted by noise (e.g. HPD periodograms)
by combining wavelet thresholding and fuzzy clustering in the intrinsic wavelet coefficient domain according to
the following steps:
Transform a collection of noisy HPD spectral matrices to the intrinsic wavelet domain and denoise the HPD matrix curves by (tree-structured) thresholding of wavelet coefficients with
pdSpecEst1D
.Apply an intrinsic fuzzy c-means algorithm to the coarsest midpoints at scale
j = 0
across subjects.Taking into account the fuzzy cluster assignments in the previous step, apply a weighted fuzzy c-means algorithm to the nonzero thresholded wavelet coefficients across subjects from scale
j = 1
up toj = jmax
.
More details can be found in Chapter 3 of (Chau 2018) and the accompanying vignettes.
Usage
pdSpecClust1D(P, K, jmax, metric = "Riemannian", m = 2, d.jmax = 0.1,
eps = c(1e-04, 1e-04), tau = 0.5, max_iter = 50,
return.centers = FALSE, ...)
Arguments
P |
a ( |
K |
the number of clusters, a positive integer larger than 1. |
jmax |
an upper bound on the maximum wavelet scale to be considered in the clustering procedure. If
|
metric |
the metric that the space of HPD matrices is equipped with. The default choice is |
m |
the fuzziness parameter for both fuzzy c-means algorithms. |
d.jmax |
a proportion that is used to determine the maximum wavelet scale to be considered in the clustering
procedure. A larger value |
eps |
an optional vector with two components determining the stopping criterion. The first step in the cluster procedure
terminates if the (integrated) intrinsic distance between cluster centers is smaller than |
tau |
an optional argument tuning the weight given to the cluster assignments obtained in the first step of
the clustering algorithm. If |
max_iter |
an optional argument tuning the maximum number of iterations in both the first and second step of the
clustering algorithm, defaults to |
return.centers |
should the cluster centers transformed back the space of HPD matrices also be returned?
Defaults to |
... |
additional arguments passed on to |
Details
The input array P
corresponds to a collection of initial noisy HPD spectral estimates of the (d,d)
-dimensional
spectral matrix at n
different frequencies, with n = 2^J
for some J > 0
, for S
different subjects.
These can be e.g., multitaper HPD periodograms given as output by the function pdPgram
.
First, for each subject s = 1,\ldots,S
, thresholded wavelet coefficients in the intrinsic wavelet domain are
calculated by pdSpecEst1D
, see the function documentation for additional details on the wavelet thresholding
procedure.
The maximum wavelet scale taken into account in the clustering procedure is determined by the arguments
jmax
and d.jmax
. The maximum scale is set to the minimum of jmax
and the wavelet
scale j
for which the proportion of nonzero thresholded wavelet coefficients (averaged
across subjects) is smaller than d.jmax
.
The S
subjects are assigned to K
different clusters in a probabilistic fashion according to a
two-step procedure:
In the first step, an intrinsic fuzzy c-means algorithm, with fuzziness parameter
m
is applied to theS
coarsest midpoints at scalej = 0
in the subject-specific midpoint pyramids. Note that the distance function in the intrinsic c-means algorithm relies on the chosen metric on the space of HPD matrices.In the second step, a weighted fuzzy c-means algorithm based on the Euclidean distance function, also with fuzziness parameter
m
, is applied to the nonzero thresholded wavelet coefficients of theS
different subjects. The tuning parametertau
controls the weight given to the cluster assignments obtained in the first step of the clustering algorithm.
The function computes the forward and inverse intrinsic AI wavelet transform in the space of HPD matrices equipped with
one of the following metrics: (i) the affine-invariant Riemannian metric (default) as detailed in e.g., (Bhatia 2009)[Chapter 6]
or (Pennec et al. 2006); (ii) the log-Euclidean metric, the Euclidean inner product between matrix logarithms;
(iii) the Cholesky metric, the Euclidean inner product between Cholesky decompositions; (iv) the Euclidean metric; or
(v) the root-Euclidean metric. The default choice of metric (affine-invariant Riemannian) satisfies several useful properties
not shared by the other metrics, see (Chau and von
Sachs 2019) or (Chau 2018) for more details. Note that this comes
at the cost of increased computation time in comparison to one of the other metrics.
If return.centers = TRUE
, the function also returns the K
HPD spectral matrix curves corresponding to
the cluster centers based on the given metric by applying the intrinsic inverse AI wavelet transform (
InvWavTransf1D
) to the cluster centers in the wavelet domain.
Value
Depending on the input the function returns a list with five or six components:
- cl.prob
an (
S,K
)-dimensional matrix, where the value at position (s,k
) in the matrix corresponds to the probabilistic cluster membership assignment of subjects
with respect to clusterk
.- cl.centers.D
a list of
K
wavelet coefficient pyramids, where each pyramid of wavelet coefficients is associated to a cluster center.- cl.centers.M0
a list of
K
arrays of coarse-scale midpoints at scalej = 0
, where each array is associated to a cluster center.- cl.centers.f
only available if
return.centers = TRUE
, returning a list ofK
(d,d,n)
-dimensional arrays, where each array corresponds to a lengthn
curve of(d,d)
-dimensional HPD matrices associated to a cluster center.- cl.jmax
the maximum wavelet scale taken into account in the clustering procedure determined by the input arguments
jmax
andd.jmax
.
References
Bhatia R (2009).
Positive Definite Matrices.
Princeton University Press, New Jersey.
Chau J (2018).
Advances in Spectral Analysis for Multivariate, Nonstationary and Replicated Time Series.
phdthesis, Universite catholique de Louvain.
Chau J, von
Sachs R (2019).
“Intrinsic wavelet regression for curves of Hermitian positive definite matrices.”
Journal of the American Statistical Association.
doi: 10.1080/01621459.2019.1700129.
Pennec X, Fillard P, Ayache N (2006).
“A Riemannian framework for tensor computing.”
International Journal of Computer Vision, 66(1), 41–66.
See Also
pdSpecEst1D
, pdSpecClust2D
, pdkMeans
Examples
## ARMA(1,1) process: Example 11.4.1 in (Brockwell and Davis, 1991)
Phi1 <- array(c(0.5, 0, 0, 0.6, rep(0, 4)), dim = c(2, 2, 2))
Phi2 <- array(c(0.7, 0, 0, 0.4, rep(0, 4)), dim = c(2, 2, 2))
Theta <- array(c(0.5, -0.7, 0.6, 0.8, rep(0, 4)), dim = c(2, 2, 2))
Sigma <- matrix(c(1, 0.71, 0.71, 2), nrow = 2)
## Generate periodogram data for 10 subjects in 2 groups
pgram <- function(Phi) pdPgram(rARMA(2^9, 2, Phi, Theta, Sigma)$X)$P
P <- array(c(replicate(5, pgram(Phi1)), replicate(5, pgram(Phi2))), dim=c(2,2,2^8,10))
cl <- pdSpecClust1D(P, K = 2, metric = "logEuclidean")