abesspca {abess}R Documentation

Adaptive best subset selection for principal component analysis


Adaptive best subset selection for principal component analysis


  type = c("predictor", "gram"),
  sparse.type = c("fpc", "kpc"),
  cor = FALSE,
  support.size = NULL,
  c.max = NULL,
  lambda = 0,
  always.include = NULL,
  group.index = NULL,
  splicing.type = 1,
  max.splicing.iter = 20,
  warm.start = TRUE,



A matrix object. It can be either a predictor matrix where each row is an observation and each column is a predictor or a sample covariance/correlation matrix. If x is a predictor matrix, it can be in sparse matrix format (inherit from class "dgCMatrix" in package Matrix).


If type = "predictor", x is considered as the predictor matrix. If type = "gram", x is considered as a sample covariance or correlation matrix.


If sparse.type = "fpc", then best subset selection performs on the first principal component; If sparse.type = "kpc", then best subset selection performs on the first K principal components.


A logical value. If cor = TRUE, perform PCA on the correlation matrix; otherwise, the covariance matrix. This option is available only if type = "predictor". Default: cor = FALSE.


An integer vector. It represents the alternative support sizes when sparse.type = "fpc", while each support size controls the sparsity of a principal component when sparse.type = "kpc". When sparse.type = "fpc" but support.size is not supplied, it is set as support.size = 1:min(ncol(x), 100) if group.index = NULL; otherwise, support.size = 1:min(length(unique(group.index)), 100). When sparse.type = "kpc" but support.size is not supplied, then for 20% principal components, it is set as min(ncol(x), 100) if group.index = NULL; otherwise, min(length(unique(group.index)), 100).


an integer splicing size. The default of c.max is the maximum of 2 and max(support.size) / 2.


A single lambda value for regularized best subset selection. Default is 0.


An integer vector containing the indexes of variables that should always be included in the model.


A vector of integers indicating the which group each variable is in. For variables in the same group, they should be located in adjacent columns of x and their corresponding index in group.index should be the same. Denote the first group as 1, the second 2, etc. If you do not fit a model with a group structure, please set group.index = NULL (the default).


Optional type for splicing. If splicing.type = 1, the number of variables to be spliced is c.max, ..., 1; if splicing.type = 2, the number of variables to be spliced is c.max, c.max/2, ..., 1. Default: splicing.type = 1.


The maximum number of performing splicing algorithm. In most of the case, only a few times of splicing iteration can guarantee the convergence. Default is max.splicing.iter = 20.


Whether to use the last solution as a warm start. Default is warm.start = TRUE.


further arguments to be passed to or from methods.


Adaptive best subset selection for principal component analysis aim to solve the non-convex optimization problem:

\arg\max_{v} v^\top Σ v, s.t.\quad v^\top v=1, \|v\|_0 ≤q s,

where s is support size. A generic splicing technique is implemented to solve this problem. By exploiting the warm-start initialization, the non-convex optimization problem at different support size (specified by support.size) can be efficiently solved.


A S3 abesspca class object, which is a list with the following components:


A p-by-length(support.size) loading matrix of sparse principal components (PC), where each row is a variable and each column is a support size;


The number of variables.


The same as input.


The actual support.size values used. Note that it is not necessary the same as the input if the later have non-integer values or duplicated values.


A vector with size length(support.size). It records the explained variance at each support size.


A vector with the same length as ev. It records the percentage of explained variance at each support size.


If sparse.type = "fpc", it is the total variance of the explained by first principal component; otherwise, the total standard deviations of all principal components.


The original call to abess.


Jin Zhu, Junxian Zhu, Ruihuang Liu, Junhao Huang, Xueqin Wang


A polynomial algorithm for best-subset selection problem. Junxian Zhu, Canhong Wen, Jin Zhu, Heping Zhang, Xueqin Wang. Proceedings of the National Academy of Sciences Dec 2020, 117 (52) 33117-33123; DOI: 10.1073/pnas.2014241117

Sparse principal component analysis. Hui Zou, Hastie Trevor, and Tibshirani Robert. Journal of computational and graphical statistics 15.2 (2006): 265-286.

See Also

print.abesspca, coef.abesspca,



## predictor matrix input:
pca_fit <- abesspca(USArrests)

## covariance matrix input:
pca_fit <- abesspca(stats::cov(USArrests), type = "gram")

## robust covariance matrix input:
rob_cov <- MASS::cov.rob(USArrests)[["cov"]]
rob_cov <- (rob_cov + t(rob_cov)) / 2
pca_fit <- abesspca(rob_cov, type = "gram")

## K-component principal component analysis
pca_fit <- abesspca(USArrests,
  sparse.type = "kpc",
  support.size = c(1, 2)

[Package abess version 0.3.0 Index]