abesspca {abess}R Documentation

Adaptive best subset selection for principal component analysis

Description

Adaptive best subset selection for principal component analysis

Usage

abesspca(
  x,
  type = c("predictor", "gram"),
  sparse.type = c("fpc", "kpc"),
  cor = FALSE,
  support.size = NULL,
  c.max = NULL,
  lambda = 0,
  always.include = NULL,
  group.index = NULL,
  splicing.type = 1,
  max.splicing.iter = 20,
  warm.start = TRUE,
  ...
)

Arguments

x

A matrix object. It can be either a predictor matrix where each row is an observation and each column is a predictor or a sample covariance/correlation matrix. If x is a predictor matrix, it can be in sparse matrix format (inherit from class "dgCMatrix" in package Matrix).

type

If type = "predictor", x is considered as the predictor matrix. If type = "gram", x is considered as a sample covariance or correlation matrix.

sparse.type

If sparse.type = "fpc", then best subset selection performs on the first principal component; If sparse.type = "kpc", then best subset selection performs on the first K principal components.

cor

A logical value. If cor = TRUE, perform PCA on the correlation matrix; otherwise, the covariance matrix. This option is available only if type = "predictor". Default: cor = FALSE.

support.size

An integer vector. It represents the alternative support sizes when sparse.type = "fpc", while each support size controls the sparsity of a principal component when sparse.type = "kpc". When sparse.type = "fpc" but support.size is not supplied, it is set as support.size = 1:min(ncol(x), 100) if group.index = NULL; otherwise, support.size = 1:min(length(unique(group.index)), 100). When sparse.type = "kpc" but support.size is not supplied, then for 20% principal components, it is set as min(ncol(x), 100) if group.index = NULL; otherwise, min(length(unique(group.index)), 100).

c.max

an integer splicing size. The default of c.max is the maximum of 2 and max(support.size) / 2.

lambda

A single lambda value for regularized best subset selection. Default is 0.

always.include

An integer vector containing the indexes of variables that should always be included in the model.

group.index

A vector of integers indicating the which group each variable is in. For variables in the same group, they should be located in adjacent columns of x and their corresponding index in group.index should be the same. Denote the first group as 1, the second 2, etc. If you do not fit a model with a group structure, please set group.index = NULL (the default).

splicing.type

Optional type for splicing. If splicing.type = 1, the number of variables to be spliced is c.max, ..., 1; if splicing.type = 2, the number of variables to be spliced is c.max, c.max/2, ..., 1. Default: splicing.type = 1.

max.splicing.iter

The maximum number of performing splicing algorithm. In most of the case, only a few times of splicing iteration can guarantee the convergence. Default is max.splicing.iter = 20.

warm.start

Whether to use the last solution as a warm start. Default is warm.start = TRUE.

...

further arguments to be passed to or from methods.

Details

Adaptive best subset selection for principal component analysis aim to solve the non-convex optimization problem:

\arg\max_{v} v^\top Σ v, s.t.\quad v^\top v=1, \|v\|_0 ≤q s,

where s is support size. A generic splicing technique is implemented to solve this problem. By exploiting the warm-start initialization, the non-convex optimization problem at different support size (specified by support.size) can be efficiently solved.

Value

A S3 abesspca class object, which is a list with the following components:

coef

A p-by-length(support.size) loading matrix of sparse principal components (PC), where each row is a variable and each column is a support size;

nvars

The number of variables.

sparse.type

The same as input.

support.size

The actual support.size values used. Note that it is not necessary the same as the input if the later have non-integer values or duplicated values.

ev

A vector with size length(support.size). It records the explained variance at each support size.

pev

A vector with the same length as ev. It records the percentage of explained variance at each support size.

var.all

If sparse.type = "fpc", it is the total variance of the explained by first principal component; otherwise, the total standard deviations of all principal components.

call

The original call to abess.

Author(s)

Jin Zhu, Junxian Zhu, Ruihuang Liu, Junhao Huang, Xueqin Wang

References

A polynomial algorithm for best-subset selection problem. Junxian Zhu, Canhong Wen, Jin Zhu, Heping Zhang, Xueqin Wang. Proceedings of the National Academy of Sciences Dec 2020, 117 (52) 33117-33123; DOI: 10.1073/pnas.2014241117

Sparse principal component analysis. Hui Zou, Hastie Trevor, and Tibshirani Robert. Journal of computational and graphical statistics 15.2 (2006): 265-286.

See Also

print.abesspca, coef.abesspca,

Examples


library(abess)

## predictor matrix input:
head(USArrests)
pca_fit <- abesspca(USArrests)
pca_fit

## covariance matrix input:
pca_fit <- abesspca(stats::cov(USArrests), type = "gram")
pca_fit

## robust covariance matrix input:
rob_cov <- MASS::cov.rob(USArrests)[["cov"]]
rob_cov <- (rob_cov + t(rob_cov)) / 2
pca_fit <- abesspca(rob_cov, type = "gram")
pca_fit

## K-component principal component analysis
pca_fit <- abesspca(USArrests,
  sparse.type = "kpc",
  support.size = c(1, 2)
)
coef(pca_fit)


[Package abess version 0.3.0 Index]