R: K-means clustering for HPD matrices

pdkMeans {pdSpecEst}

R Documentation

K-means clustering for HPD matrices

Description

pdkMeans performs (fuzzy) k-means clustering for collections of HPD matrices, such as covariance or spectral density matrices, based on a number of different metrics in the space of HPD matrices.

Usage

pdkMeans(X, K, metric = "Riemannian", m = 1, eps = 1e-05,
  max_iter = 100, centroids)

Arguments

`X`	a (`d,d,S`)-dimensional array of (`d,d`)-dimensional HPD matrices for `S` different subjects. Also accepts a (`d,d,n,S`)-dimensional array, which is understood to be an array of `n`-dimensional sequences of (`d,d`)-dimensional HPD matrices for `S` different subjects.
`K`	the number of clusters, a positive integer larger than 1.
`metric`	the metric that the space of HPD matrices is equipped with. The default choice is `"Riemannian"`, but this can also be one of: `"logEuclidean"`, `"Cholesky"`, `"rootEuclidean"` or `"Euclidean"`. Additional details are given below.
`m`	a fuzziness parameter larger or equal to `1`. If `m = 1` the cluster assignments are no longer fuzzy, i.e., the procedure performs hard clustering. Defaults to `m = 1`.
`eps`	an optional tolerance parameter determining the stopping criterion. The k-means algorithm terminates if the intrinsic distance between cluster centers is smaller than `eps`, defaults to `eps = 1e-05`.
`max_iter`	an optional parameter tuning the maximum number of iterations in the k-means algorithm, defaults to `max_iter = 100`.
`centroids`	an optional (`d,d,K`)- or (`d,d,n,K`)-dimensional array depending on the input array `X` specifying the initial cluster centroids. If not specified, `K` initial cluster centroids are randomly sampled without replacement from the input array `X`.

Details

The input array X corresponds to a collection of (d,d)-dimensional HPD matrices for S different subjects. If the fuzziness parameter satisfies m > 1, the S subjects are assigned to K different clusters in a probabilistic fashion according to a fuzzy k-means algorithm as detailed in classical texts, such as (Bezdek 1981). If m = 1, the S subjects are assigned to the K clusters in a non-probabilistic fashion according to a standard (hard) k-means algorithm. If not specified by the user, the K cluster centers are initialized by random sampling without replacement from the input array of HPD matrices X. The distance measure in the (fuzzy) k-means algorithm is induced by the metric on the space of HPD matrices specified by the user. By default, the space of HPD matrices is equipped with (i) the affine-invariant Riemannian metric (metric = 'Riemannian') as detailed in e.g., (Bhatia 2009)[Chapter 6] or (Pennec et al. 2006). Instead, this can also be one of: (ii) the log-Euclidean metric (metric = 'logEuclidean'), the Euclidean inner product between matrix logarithms; (iii) the Cholesky metric (metric = 'Cholesky'), the Euclidean inner product between Cholesky decompositions; (iv) the Euclidean metric (metric = 'Euclidean'); or (v) the root-Euclidean metric (metric = 'rootEuclidean'). The default choice of metric (affine-invariant Riemannian) satisfies several useful properties not shared by the other metrics, see e.g., C18pdSpecEst for more details. Note that this comes at the cost of increased computation time in comparison to one of the other metrics.

Value

Returns a list with two components:

cl.assignments: an (S,K)-dimensional matrix, where the value at position (s,k) in the matrix corresponds to the (probabilistic or binary) cluster membership assignment of subject s with respect to cluster k.
cl.centroids: either a (d,d,K)- or (d,d,n,K)-dimensional array depending on the input array X corresponding respectively to the K (d,d)- or (d,d,n)-dimensional final cluster centroids.

References

Bezdek J (1981). Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York.

Bhatia R (2009). Positive Definite Matrices. Princeton University Press, New Jersey.

Pennec X, Fillard P, Ayache N (2006). “A Riemannian framework for tensor computing.” International Journal of Computer Vision, 66(1), 41–66.

Examples

## Generate 20 random HPD matrices in 2 groups
m <- function(rescale){
 x <- matrix(complex(real = rescale * rnorm(9), imaginary = rescale * rnorm(9)), nrow = 3)
 t(Conj(x)) %*% x
}
X <- array(c(replicate(10, m(0.25)), replicate(10, m(1))), dim = c(3, 3, 20))

## Compute fuzzy k-means cluster assignments
cl <- pdkMeans(X, K = 2, m = 2)$cl.assignments

[Package pdSpecEst version 1.2.4 Index]