hcsvd {bdsvd}R Documentation

Hierarchical Variable Clustering Using Singular Vectors (HC-SVD).

Description

Performs HC-SVD to reveal the hierarchical variable structure as descried in Bauer (202Xb). For this divise approach, each cluster is split into two clusters iteratively. Potential splits are identified by the first sparse loadings (which are sparse approximations of the first right singular vectors, i.e., vectors with many zero values) that mirror the masked shape of the correlation matrix. This procedure is continued until each variable lies in a single cluster.

Usage

hcsvd(X, k = "all", linkage = "single", reliability, R, max.iter, trace = TRUE)

Arguments

X

Data matrix of dimension n x p. The data matrix is standardized during the analysis by hcsvd.

k

Number of sparse loadings to be used. This should be "all" for all sparse loadings, or "Kaiser" for as many sparse loadings as there are eigenvalues larger or equal to one (see Bauer (202Xb) for details). Selecting "Kaiser" reduces computation time.

linkage

The linkage function to be used. This should be one of "average", "single", or "RV" (for RV-coefficient).

reliability

By default, the value of each cluster equals the distance calculated by the chosen linkage function. If preferred, the value of each cluster can be assigned by its reliability. When reliability = spectral, the reliability is calculated by the averaged spectral norm.

R

Sample correlation matrix of X. By default, R <- cov(X).

max.iter

How many iterations should be performed for computing the sparse loadings. Default is 200.

trace

Print out progress as p-1 iterations for divisive hierarchical clustering are performed. Default is TRUE.

Details

The sparse loadings are computed using the method by Shen & Huang (2008), implemented in the irlba package.

Value

A list with two components:

dist.matrix

The ultrametric distance matrix (cophenetic matrix) of the HC-SVD structure as an object of class dist.

u.cor

The ultrametric correlation matrix of X obtained by HC-SVD as an object of class matrix.

k.p

A vector of length p-1 containing the ratio k_i/p_i of the k_i sparse loadings used relative to all sparse loadings p_i for the split of each cluster. The ratio is set to NA if the cluster contains only two variables as the search for sparse loadings that reflect the split is not required in this case.

References

Bauer, J.O. (202Xb). Hierarchical variable clustering using singular vectors.

Shen, H. and Huang, J.Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation, J. Multivar. Anal. 99, 1015–1034.

Examples

#We replicate the simulation study in Bauer (202Xb)

## Not run: 
p <- 100
n <- 300
b <- 5
design <- "a"

Rho <- hcsvd.cor.sim(p = p, b = b, design = "a")
X <- scale(mvtnorm::rmvnorm(300, mean=rep(0,100), sigma=Rho, checkSymmetry = FALSE))
colnames(X) = 1:ncol(X)
hcsvd.obj <- hcsvd(X, k = "Kaiser")

#The dendrogram can be obtained from the ultrametric distance matrix:
plot(hclust(hcsvd.obj$dist.matrix))

## End(Not run)



[Package bdsvd version 0.2.0 Index]