sca {epca}R Documentation

Sparse Component Analysis

Description

sca performs sparse principal components analysis on the given numeric data matrix. Choices of rotation techniques and shrinkage operators are available.

Usage

sca(
  x,
  k = min(5, dim(x)),
  gamma = NULL,
  is.cov = FALSE,
  rotate = c("varimax", "absmin"),
  shrink = c("soft", "hard"),
  center = TRUE,
  scale = FALSE,
  normalize = FALSE,
  order = TRUE,
  flip = TRUE,
  max.iter = 1000,
  epsilon = 1e-05,
  quiet = TRUE
)

Arguments

x

matrix or Matrix to be analyzed.

k

integer, rank of approximation.

gamma

numeric(1), sparsity parameter, default to sqrt(pk), where n x p is the dimension of x.

is.cov

logical, default to FALSE, whether the x is a covariance matrix (or Gram matrix, i.e., crossprod() of some design matrix). If TRUE, both center and scale will be ignored/skipped.

rotate

character(1), rotation method. Two options are currently available: "varimax" (default) or "absmin" (see details).

shrink

character(1), shrinkage method, either "soft"- (default) or "hard"-thresholding (see details).

center

logical, whether to center columns of x (see scale()).

scale

logical, whether to scale columns of x (see scale()).

normalize

logical, whether to rows normalization should be done before and undone afterward the rotation (see details).

order

logical, whether to re-order the columns of the estimates (see Details below).

flip

logical, whether to flip the signs of the columns of estimates such that all columns are positive-skewed (see details).

max.iter

integer, maximum number of iteration (default to 1,000).

epsilon

numeric, tolerance of convergence precision (default to 0.00001).

quiet

logical, whether to mute the process report (default to TRUE)

Details

rotate: The rotate option specifies the rotation technique to use. Currently, there are two build-in options—“varimax” and “absmin”. The “varimax” rotation maximizes the element-wise L4 norm of the rotated matrix. It is faster and computationally more stable. The “absmin” rotation minimizes the absolute sum of the rotated matrix. It is sharper (as it directly minimizes the L1 norm) but slower and computationally less stable.

shrink: The shrink option specifies the shrinkage operator to use. Currently, there are two build-in options—“soft”- and “hard”-thresholding. The “soft”-thresholding universally reduce all elements and sets the small elements to zeros. The “hard”-thresholding only sets the small elements to zeros.

normalize: The argument normalize gives an indication of if and how any normalization should be done before rotation, and then undone after rotation. If normalize is FALSE (the default) no normalization is done. If normalize is TRUE then Kaiser normalization is done. (So squared row entries of normalized x sum to 1.0. This is sometimes called Horst normalization.) For rotate="absmin", if normalize is a vector of length equal to the number of indicators (i.e., the number of rows of x), then the columns are divided by normalize before rotation and multiplied by normalize after rotation. Also, If normalize is a function then it should take x as an argument and return a vector which is used like the vector above.

order: In PCA (and SVD), the principal components (and the singular vectors) are ordered. For this, we order the sparse components (i.e., the columns of z or y) by their explained variance in the data, which is defined as sum((x %*% y)^2), where y is a column of the sparse component. Note: not to be confused with the cumulative proportion of variance explained by y (and z), particularly when y (and z) is may not be strictly orthogonal.

flip: The argument flip gives an indication of if and the columns of estimated sparse component should be flipped. Note that the estimated (sparse) loadings, i.e., the weights on original variables, are column-wise invariant to a sign flipping. This is because flipping of a principal direction does not influence the amount of the explained variance by the component. If flip=TRUE, then the columns of loadings will be flip accordingly, such that each column is positive-skewed. This means that for each column, the sum of cubic elements (i.e., sum(x^3)) are non-negative.

Value

an sca object that contains:

loadings

matrix, sparse loadings of PCs.

scores

an n x k matrix, the component scores, calculated using centered (and/or scaled) x. This will only be available when is.cov = FALSE.

cpve

a numeric vector of length k, cumulative proportion of variance in x explained by the top PCs (after center and/or scale).

center

logical, this records the center parameter.

scale

logical, this records the scale parameter.

n.iter

integer, number of iteration taken.

n.obs

integer, sample size, that is, nrow(x).

References

Chen, F. and Rohe, K. (2020) "A New Basis for Sparse Principal Component Analysis."

See Also

sma, prs

Examples

## ------ example 1 ------
## simulate a low-rank data matrix with some additive Gaussian noise
n <- 300
p <- 50
k <- 5 ## rank
z <- shrinkage(polar(matrix(runif(n * k), n, k)), sqrt(n))
b <- diag(5) * 3
y <- shrinkage(polar(matrix(runif(p * k), p, k)), sqrt(p))
e <- matrix(rnorm(n * p, sd = .01), n, p)
x <- scale(z %*% b %*% t(y) + e)

## perform sparse PCA
s.sca <- sca(x, k)
s.sca

## ------ example 2 ------
## use the `pitprops` data from the `elasticnet` package
data(pitprops)

## find 6 sparse PCs
s.sca <- sca(pitprops, 6, gamma = 6, is.cov = TRUE)
print(s.sca, verbose = TRUE)


[Package epca version 1.1.0 Index]