| VBsparsePCA {VBsparsePCA} | R Documentation | 
The main function for the variational Bayesian method for sparse PCA
Description
This function employs the PX-CAVI algorithm proposed in Ning (2021). The method uses the sparse spiked-covariance model and the spike and slab prior (see below). Two different slab densities can be used: independent Laplace densities and a multivariate normal density. In Ning (2021), it recommends choosing the multivariate normal distribution. The algorithm allows the user to decide whether she/he wants to center and scale their data. The user is also allowed to change the default values of the parameters of each prior.
Usage
VBsparsePCA(
  dat,
  r,
  lambda = 1,
  slab.prior = "MVN",
  max.iter = 100,
  eps = 0.001,
  jointly.row.sparse = TRUE,
  center.scale = FALSE,
  sig2.true = NA,
  threshold = 0.5,
  theta.int = NA,
  theta.var.int = NA,
  kappa.para1 = NA,
  kappa.para2 = NA,
  sigma.a = NA,
  sigma.b = NA
)
Arguments
| dat | Data an  | 
| r | Rank. | 
| lambda | Tuning parameter for the density  | 
| slab.prior | The density  | 
| max.iter | The maximum number of iterations for running the algorithm. | 
| eps | The convergence threshold; the default is  | 
| jointly.row.sparse | The default is true, which means that the jointly row sparsity assumption is used; one could not use this assumptio by changing it to false. | 
| center.scale | The default if false. If true, then the input date will be centered and scaled. | 
| sig2.true | The default is false,  | 
| threshold | The threshold to determine whether  | 
| theta.int | The initial value of theta mean; if not provided, the algorithm will estimate it using PCA. | 
| theta.var.int | The initial value of theta.var; if not provided, the algorithm will set it to be 1e-3*diag(r). | 
| kappa.para1 | The value of  | 
| kappa.para2 | The value of  | 
| sigma.a | The value of  | 
| sigma.b | The value of  | 
Details
The model is
X_i = \theta w_i + \sigma \epsilon_i
where w_i \sim N(0, I_r), \epsilon \sim N(0,I_p).
The spike and slab prior is given by
\pi(\theta, \boldsymbol \gamma|\lambda_1, r) \propto \prod_{j=1}^p \left(\gamma_j \int_{A \in V_{r,r}} g(\theta_j|\lambda_1, A, r) \pi(A) d A+ (1-\gamma_j) \delta_0(\theta_j)\right)
g(\theta_j|\lambda_1, A, r) = C(\lambda_1)^r \exp(-\lambda_1 \|\beta_j\|_q^m)
\gamma_j| \kappa \sim Bernoulli(\kappa)
\kappa \sim Beta(\alpha_1, \alpha_2)
\sigma^2 \sim InvGamma(\sigma_a, \sigma_b)
where V_{r,r} = \{A \in R^{r \times r}: A'A = I_r\} and \delta_0 is the Dirac measure at zero.
The density g can be chosen to be the product of independent Laplace distribution (i.e., q = 1, m =1) or the multivariate normal distribution (i.e., q = 2, m = 2).
Value
| iter | The number of iterations to reach convergence. | 
| selection | A vector (if  | 
| loadings | The loadings matrix. | 
| uncertainty | The covariance of each non-zero rows in the loadings matrix. | 
| scores | Score functions for the  | 
| sig2 | Variance of the noise. | 
| obj.fn | A vector contains the value of the objective function of each iteration. It can be used to check whether the algorithm converges | 
References
Ning, B. (2021). Spike and slab Bayesian sparse principal component analysis. arXiv:2102.00305.
Examples
#In this example, the first 20 rows in the loadings matrix are nonzero, the rank is 2
set.seed(2021)
library(MASS)
library(pracma)
n <- 200
p <- 1000
s <- 20
r <- 2
sig2 <- 0.1
# generate eigenvectors
U.s <- randortho(s, type = c("orthonormal"))
if (r == 1) {
  U <- rep(0, p)
  U[1:s] <- as.vector(U.s[, 1:r])
} else {
  U <- matrix(0, p, r)
  U[1:s, ] <- U.s[, 1:r]
}
s.star <- rep(0, p)
s.star[1:s] <- 1
eigenvalue <- seq(20, 10, length.out = r)
# generate Sigma
if (r == 1) {
  theta.true <- U * sqrt(eigenvalue)
  Sigma <- tcrossprod(theta.true) + sig2*diag(p)
} else {
  theta.true <- U %*% sqrt(diag(eigenvalue))
  Sigma <- tcrossprod(theta.true) + sig2 * diag(p)
}
# generate n*p dataset
X <- t(mvrnorm(n, mu = rep(0, p), Sigma = Sigma))
result <- VBsparsePCA(dat = t(X), r = 2, jointly.row.sparse = TRUE, center.scale = FALSE)
loadings <- result$loadings
scores <- result$scores