R: Hierarchical Variable Clustering Using Singular Vectors...

hcsvd {bdsvd}

R Documentation

Hierarchical Variable Clustering Using Singular Vectors (HC-SVD).

Description

Performs HC-SVD to reveal the hierarchical variable structure as descried in Bauer (202Xb). For this divise approach, each cluster is split into two clusters iteratively. Potential splits are identified by the first sparse loadings (which are sparse approximations of the first right singular vectors, i.e., vectors with many zero values) that mirror the masked shape of the correlation matrix. This procedure is continued until each variable lies in a single cluster.

Usage

hcsvd(X, k = "all", linkage = "single", reliability, R, max.iter, trace = TRUE)

Arguments

`X`	Data matrix of dimension `n x p`. The data matrix is standardized during the analysis by `hcsvd`.
`k`	Number of sparse loadings to be used. This should be `"all"` for all sparse loadings, or `"Kaiser"` for as many sparse loadings as there are eigenvalues larger or equal to one (see Bauer (202Xb) for details). Selecting `"Kaiser"` reduces computation time.
`linkage`	The linkage function to be used. This should be one of `"average"`, `"single"`, or `"RV"` (for RV-coefficient).
`reliability`	By default, the value of each cluster equals the distance calculated by the chosen linkage function. If preferred, the value of each cluster can be assigned by its reliability. When `reliability = spectral`, the reliability is calculated by the averaged spectral norm.
`R`	Sample correlation matrix of `X`. By default, `R <- cov(X)`.
`max.iter`	How many iterations should be performed for computing the sparse loadings. Default is `200`.
`trace`	Print out progress as `p-1` iterations for divisive hierarchical clustering are performed. Default is `TRUE`.

Details

The sparse loadings are computed using the method by Shen & Huang (2008), implemented in the irlba package.

Value

A list with two components:

`dist.matrix`	The ultrametric distance matrix (cophenetic matrix) of the HC-SVD structure as an object of class `dist`.
`u.cor`	The ultrametric correlation matrix of `X` obtained by HC-SVD as an object of class `matrix`.
`k.p`	A vector of length `p-1` containing the ratio `k_i/p_i` of the `k_i` sparse loadings used relative to all sparse loadings `p_i` for the split of each cluster. The ratio is set to `NA` if the cluster contains only two variables as the search for sparse loadings that reflect the split is not required in this case.

References

Bauer, J.O. (202Xb). Hierarchical variable clustering using singular vectors.

Shen, H. and Huang, J.Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation, J. Multivar. Anal. 99, 1015–1034.

Examples

#We replicate the simulation study in Bauer (202Xb)

## Not run: 
p <- 100
n <- 300
b <- 5
design <- "a"

Rho <- hcsvd.cor.sim(p = p, b = b, design = "a")
X <- scale(mvtnorm::rmvnorm(300, mean=rep(0,100), sigma=Rho, checkSymmetry = FALSE))
colnames(X) = 1:ncol(X)
hcsvd.obj <- hcsvd(X, k = "Kaiser")

#The dendrogram can be obtained from the ultrametric distance matrix:
plot(hclust(hcsvd.obj$dist.matrix))

## End(Not run)

[Package bdsvd version 0.2.0 Index]