R: Sparse PCA

admm.spca {ADMM}

R Documentation

Sparse PCA

Description

Sparse Principal Component Analysis aims at finding a sparse vector by solving

\textrm{max}_x~x^T\Sigma x \quad \textrm{s.t.} \quad \|x\|_2\le 1,~\|x\|_0\le K

where \|x\|_0 is the number of non-zero elements in a vector x. A convex relaxation of this problem was proposed to solve the following problem,

\textrm{max}_X~<\Sigma,X> ~\textrm{s.t.} \quad Tr(X)=1,~\|X\|_0 \le K^2, ~X\ge 0,~\textrm{rank}(X)=1

where X=xx^T is a (p\times p) matrix that is outer product of a vector x by itself, and X\ge 0 means the matrix X is positive semidefinite. With the rank condition dropped, it can be restated as

\textrm{max}_X~ <\Sigma,X>-\rho\|X\|_1 \quad \textrm{s.t.}\quad Tr(X)=1,X\ge 0.

After acquiring each principal component vector, an iterative step based on Schur complement deflation method is applied to regress out the impact of previously-computed projection vectors. It should be noted that those sparse basis may not be orthonormal.

Usage

admm.spca(
  Sigma,
  numpc,
  mu = 1,
  rho = 1,
  abstol = 1e-04,
  reltol = 0.01,
  maxiter = 1000
)

Arguments

`Sigma`	a `(p\times p)` (sample) covariance matrix.
`numpc`	number of principal components to be extracted.
`mu`	an augmented Lagrangian parameter.
`rho`	a regularization parameter for sparsity.
`abstol`	absolute tolerance stopping criterion.
`reltol`	relative tolerance stopping criterion.
`maxiter`	maximum number of iterations.

Value

a named list containing

basis: a (p\times numpc) matrix whose columns are sparse principal components.
history: a length-numpc list of dataframes recording iteration numerics. See the section for more details.

Iteration History

For SPCA implementation, main computation is sequentially performed for each projection vector. The history field is a list of length numpc, where each element is a data frame containing iteration history recording following fields over iterates,

r_norm: norm of primal residual
s_norm: norm of dual residual
eps_pri: feasibility tolerance for primal feasibility condition
eps_dual: feasibility tolerance for dual feasibility condition

In accordance with the paper, iteration stops when both r_norm and s_norm values become smaller than eps_pri and eps_dual, respectively.

References

Ma S (2013). “Alternating Direction Method of Multipliers for Sparse Principal Component Analysis.” Journal of the Operations Research Society of China, 1(2), 253–274. ISSN 2194-668X, 2194-6698, doi: 10.1007/s40305-013-0016-9.

Examples

## generate a random matrix and compute its sample covariance
X    = matrix(rnorm(1000*5),nrow=1000)
covX = stats::cov(X)

## compute 3 sparse basis
output = admm.spca(covX, 3)

[Package ADMM version 0.3.3 Index]