R: Computes multiple principal components using our modified...

SPCAvRP_deflation {SPCAvRP}

R Documentation

Computes multiple principal components using our modified deflation scheme

Description

Computes m leading eigenvectors of the sample covariance matrix which are sparse and orthogonal, using the modified deflation scheme in conjunction with the SPCAvRP algorithm.

Usage

SPCAvRP_deflation(data, cov = FALSE, m, l, d = 20, 
A = 600, B = 200, center_data = TRUE)

Arguments

`data`	Either the data matrix (`p x n`) or the sample covariance matrix (`p x p`).
`cov`	`TRUE` if data is given as a sample covariance matrix.
`m`	The number of principal components to estimate.
`l`	The array of length `m` with the desired sparsity of `m` principle components (see Details).
`d`	The dimension of the random projections (see Details).
`A`	Number of projections over which to aggregate (see Details).
`B`	Number of projections in a group from which to select (see Details).
`center_data`	`TRUE` if the data matrix should be centered (see Details).

Details

This function implements the modified deflation scheme in conjunction with SPCAvRP (Algorithm 2 in the reference given below).

If the true sparsity level is known and for each component is equal to k, use d = k and l = rep(k,m). Sparsity levels of different components may take different values. If k is unknown, appropriate k could be chosen from an array of different values by inspecting the explained variance for one component at the time and by using SPCAvRP in a combination with the deflation scheme implemented in SPCAvRP_deflation.

It is desirable to choose A (and B = ceiling(A/3)) as big as possible subject to the computational budget. In general, we suggest using A = 300 and B = 100 when the dimension of data is a few hundreds, while A = 600 and B = 200 when the dimension is on order of 1000.

If center_data == TRUE and data is given as a data matrix, the first step is to center it by executing scale(data, center_data, FALSE), which subtracts the column means of data from their corresponding columns.

Value

Returns a list of two elements:

`vector`	A matrix whose `m` columns are the estimated eigenvectors.
`value`	An array with `m` estimated eigenvalues.

Author(s)

Milana Gataric, Tengyao Wang and Richard J. Samworth

References

Milana Gataric, Tengyao Wang and Richard J. Samworth (2018) Sparse principal component analysis via random projections https://arxiv.org/abs/1712.05630

Examples

p <- 50 # data dimension
k <- 8  # true sparsity of each component
v1 <- 1/sqrt(k)*c(rep(1, k), rep(0, p-k)) # first principal compnent (PC)
v2 <- 1/sqrt(k)*c(rep(0,4), 1, -1, 1, -1, rep(1,4), rep(0,p-12)) # 2nd PC
v3 <- 1/sqrt(k)*c(rep(0,6), 1, -rep(1,4), rep(1,3), rep(0,p-14)) # 3rd PC
Sigma <- diag(p) + 40*tcrossprod(v1) + 20*tcrossprod(v2) + 5*tcrossprod(v3) # population covariance 
mu <- rep(0, p) # population mean
n <- 2000 # number of observations
loss = function(u,v){
  sqrt(abs(1-sum(v*u)^2))
}
loss_sub = function(U,V){
  U<-qr.Q(qr(U)); V<-qr.Q(qr(V))
  norm(tcrossprod(U)-tcrossprod(V),"2")
}
set.seed(1)
X <- mvrnorm(n, mu, Sigma) # data matrix

spcavrp.def <- SPCAvRP_deflation(data = X, cov = FALSE, m = 2, l = rep(k,2), 
                                 d = k, A = 200, B = 70, center_data = FALSE)
subspace_estimation<-data.frame(
  loss_sub(matrix(c(v1,v2),ncol=2),spcavrp.def$vector),
  loss(spcavrp.def$vector[,1],v1),
  loss(spcavrp.def$vector[,2],v2),
  crossprod(spcavrp.def$vector[,1],spcavrp.def$vector[,2]))
colnames(subspace_estimation)<-c("loss_sub","loss_v1","loss_v2","inner_prod")
rownames(subspace_estimation)<-c("")
print(subspace_estimation)

[Package SPCAvRP version 0.4 Index]