abesspca {abess}  R Documentation 
Adaptive best subset selection for principal component analysis
abesspca(
x,
type = c("predictor", "gram"),
sparse.type = c("fpc", "kpc"),
cor = FALSE,
kpc.num = NULL,
support.size = NULL,
gs.range = NULL,
tune.path = c("sequence", "gsection"),
tune.type = c("gic", "aic", "bic", "ebic", "cv"),
nfolds = 5,
foldid = NULL,
ic.scale = 1,
c.max = NULL,
always.include = NULL,
group.index = NULL,
screening.num = NULL,
splicing.type = 1,
max.splicing.iter = 20,
warm.start = TRUE,
num.threads = 0,
...
)
x 
A matrix object. It can be either a predictor matrix
where each row is an observation and each column is a predictor or
a sample covariance/correlation matrix.
If 
type 
If 
sparse.type 
If 
cor 
A logical value. If 
kpc.num 
A integer decide the number of principal components to be sequentially considered. 
support.size 
It is a flexible input. If it is an integer vector.
It represents the support sizes to be considered for each principal component.
If it is a 
gs.range 
A integer vector with two elements.
The first element is the minimum model size considered by goldensection,
the later one is the maximum one. Default is 
tune.path 
The method to be used to select the optimal support size. For

tune.type 
The type of criterion for choosing the support size.
Available options are 
nfolds 
The number of folds in crossvalidation. Default is 
foldid 
an optional integer vector of values between 1, ..., nfolds identifying what fold each observation is in.
The default 
ic.scale 
A nonnegative value used for multiplying the penalty term
in information criterion. Default: 
c.max 
an integer splicing size. The default of 
always.include 
An integer vector containing the indexes of variables that should always be included in the model. 
group.index 
A vector of integers indicating the which group each variable is in.
For variables in the same group, they should be located in adjacent columns of 
screening.num 
An integer number. Preserve 
splicing.type 
Optional type for splicing.
If 
max.splicing.iter 
The maximum number of performing splicing algorithm.
In most of the case, only a few times of splicing iteration can guarantee the convergence.
Default is 
warm.start 
Whether to use the last solution as a warm start. Default is 
num.threads 
An integer decide the number of threads to be
concurrently used for crossvalidation (i.e., 
... 
further arguments to be passed to or from methods. 
Adaptive best subset selection for principal component analysis (abessPCA) aim to solve the nonconvex optimization problem:
\arg\min_{v} v^\top \Sigma v, s.t.\quad v^\top v=1, \v\_0 \leq s,
where s
is support size.
Here, \Sigma
is covariance matrix, i.e.,
\Sigma = \frac{1}{n} X^{\top} X.
A generic splicing technique is implemented to
solve this problem.
By exploiting the warmstart initialization, the nonconvex optimization
problem at different support size (specified by support.size
)
can be efficiently solved.
The abessPCA can be conduct sequentially for each component.
Please see the multiple principal components Section on the website
for more details about this function.
For abesspca
function, the arguments kpc.num
control the number of components to be consider.
When sparse.type = "fpc"
but support.size
is not supplied,
it is set as support.size = 1:min(ncol(x), 100)
if group.index = NULL
;
otherwise, support.size = 1:min(length(unique(group.index)), 100)
.
When sparse.type = "kpc"
but support.size
is not supplied,
then for 20\
it is set as min(ncol(x), 100)
if group.index = NULL
;
otherwise, min(length(unique(group.index)), 100)
.
A S3 abesspca
class object, which is a list
with the following components:
coef 
A 
nvars 
The number of variables. 
sparse.type 
The same as input. 
support.size 
The actual support.size values used. Note that it is not necessary the same as the input if the later have noninteger values or duplicated values. 
ev 
A vector with size 
tune.value 
A value of tuning criterion of length 
kpc.num 
The number of principal component being considered. 
var.pc 
The variance of principal components obtained by performing standard PCA. 
cum.var.pc 
Cumulative sums of 
var.all 
If 
pev 
A vector with the same length as 
pev.pc 
It records the percent of explained variance (compared to 
tune.type 
The criterion type for tuning parameters. 
tune.path 
The strategy for tuning parameters. 
call 
The original call to 
It is worthy to note that, if sparse.type == "kpc"
, the coef
, support.size
, ev
, tune.value
, pev
and pev.pc
in list are list
objects.
Some parameters not described in the Details Section is explained in the document for abess
because the meaning of these parameters are very similar.
Jin Zhu, Junxian Zhu, Ruihuang Liu, Junhao Huang, Xueqin Wang
A polynomial algorithm for bestsubset selection problem. Junxian Zhu, Canhong Wen, Jin Zhu, Heping Zhang, Xueqin Wang. Proceedings of the National Academy of Sciences Dec 2020, 117 (52) 3311733123; doi: 10.1073/pnas.2014241117
Sparse principal component analysis. Hui Zou, Hastie Trevor, and Tibshirani Robert. Journal of computational and graphical statistics 15.2 (2006): 265286. doi: 10.1198/106186006X113430
print.abesspca
,
coef.abesspca
,
plot.abesspca
.
library(abess)
## predictor matrix input:
head(USArrests)
pca_fit < abesspca(USArrests)
pca_fit
plot(pca_fit)
## covariance matrix input:
cov_mat < stats::cov(USArrests) * (nrow(USArrests)  1) / nrow(USArrests)
pca_fit < abesspca(cov_mat, type = "gram")
pca_fit
## robust covariance matrix input:
rob_cov < MASS::cov.rob(USArrests)[["cov"]]
rob_cov < (rob_cov + t(rob_cov)) / 2
pca_fit < abesspca(rob_cov, type = "gram")
pca_fit
## Kcomponent principal component analysis
pca_fit < abesspca(USArrests,
sparse.type = "kpc",
support.size = 1:4
)
coef(pca_fit)
plot(pca_fit)
plot(pca_fit, "coef")
## select support size via crossvalidation ##
n < 500
p < 50
support_size < 3
dataset < generate.spc.matrix(n, p, support_size, snr = 20)
spca_fit < abesspca(dataset[["x"]], tune.type = "cv", nfolds = 5)
plot(spca_fit, type = "tune")