abessrpca {abess} | R Documentation |
Adaptive best subset selection for robust principal component analysis
Description
Decompose a matrix into the summation of low-rank matrix and sparse matrix via the best subset selection approach
Usage
abessrpca(
x,
rank,
support.size = NULL,
tune.path = c("sequence", "gsection"),
gs.range = NULL,
tune.type = c("gic", "aic", "bic", "ebic"),
ic.scale = 1,
lambda = 0,
always.include = NULL,
group.index = NULL,
c.max = NULL,
splicing.type = 2,
max.splicing.iter = 1,
warm.start = TRUE,
important.search = NULL,
max.newton.iter = 1,
newton.thresh = 0.001,
num.threads = 0,
seed = 1,
...
)
Arguments
x |
A matrix object. |
rank |
A positive integer value specify the rank of the low-rank matrix. |
support.size |
An integer vector representing the alternative support sizes.
Only used for |
tune.path |
The method to be used to select the optimal support size. For
|
gs.range |
A integer vector with two elements.
The first element is the minimum model size considered by golden-section,
the later one is the maximum one. Default is |
tune.type |
The type of criterion for choosing the support size. Available options are "gic", "ebic", "bic" and "aic". Default is "gic". |
ic.scale |
A non-negative value used for multiplying the penalty term
in information criterion. Default: |
lambda |
A single lambda value for regularized best subset selection. Default is 0. |
always.include |
An integer vector containing the indexes of variables that should always be included in the model. |
group.index |
A vector of integers indicating the which group each variable is in.
For variables in the same group, they should be located in adjacent columns of |
c.max |
an integer splicing size. Default is: |
splicing.type |
Optional type for splicing.
If |
max.splicing.iter |
The maximum number of performing splicing algorithm.
In most of the case, only a few times of splicing iteration can guarantee the convergence.
Default is |
warm.start |
Whether to use the last solution as a warm start. Default is |
important.search |
An integer number indicating the number of
important variables to be splicing.
When |
max.newton.iter |
a integer giving the maximal number of Newton's iteration iterations.
Default is |
newton.thresh |
a numeric value for controlling positive convergence tolerance.
The Newton's iterations converge when |
num.threads |
An integer decide the number of threads to be
concurrently used for cross-validation (i.e., |
seed |
Seed to be used to divide the sample into cross-validation folds.
Default is |
... |
further arguments to be passed to or from methods. |
Details
Adaptive best subset selection for robust principal component analysis aim to find two latent matrices L
and S
such that the original matrix X
can be appropriately approximated:
x = L + S + N,
where L
is a low-rank matrix, S
is a sparse matrix, N
is a dense noise matrix.
Generic splicing technique can be employed to solve this problem by iteratively improve the quality of the estimation of S
.
For a given support set \Omega
, the optimization problem:
\min_S \| x - L - S\|_F^2 \;\;{\rm s.t.}\;\; S_{ij} = 0 {\rm for } (i, j) \in \Omega^c,
still a non-convex optimization problem. We use the hard-impute algorithm proposed in one of the reference to solve this problem.
The hard-impute algorithm is an iterative algorithm, people can set max.newton.iter
and newton.thresh
to
control the solution precision of the optimization problem.
(Here, the name of the two parameters are somehow abused to make the parameters cross functions have an unified name.)
According to our experiments,
we assign properly parameters to the two parameter as the default such that the precision and runtime are well balanced,
we suggest users keep the default values unchanged.
Value
A S3 abessrpca
class object, which is a list
with the following components:
S |
A list with |
L |
The low rank matrix estimation. |
nobs |
The number of sample used for training. |
nvars |
The number of variables used for training. |
rank |
The rank of matrix |
loss |
The loss of objective function. |
tune.value |
A value of tuning criterion of length |
support.size |
The actual support.size values used. Note that it is not necessary the same as the input if the later have non-integer values or duplicated values. |
tune.type |
The criterion type for tuning parameters. |
call |
The original call to |
Note
Some parameters not described in the Details Section is explained in the document for abess
because the meaning of these parameters are very similar.
At present, l_2
regularization and group selection are not support,
and thus, set lambda
and group.index
have no influence on the output.
This feature will coming soon.
References
A polynomial algorithm for best-subset selection problem. Junxian Zhu, Canhong Wen, Jin Zhu, Heping Zhang, Xueqin Wang. Proceedings of the National Academy of Sciences Dec 2020, 117 (52) 33117-33123; doi:10.1073/pnas.2014241117
Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust principal component analysis? Journal of the ACM. 58, 3, Article 11 (May 2011), 37 pages. doi:10.1145/1970392.1970395
Mazumder, Rahul, Trevor Hastie, and Robert Tibshirani. Spectral regularization algorithms for learning large incomplete matrices. The Journal of Machine Learning Research 11 (2010): 2287-2322.
Examples
library(abess)
Sys.setenv("OMP_THREAD_LIMIT" = 2)
n <- 30
p <- 30
true_S_size <- 60
true_L_rank <- 2
dataset <- generate.matrix(n, p, support.size = true_S_size, rank = true_L_rank)
res <- abessrpca(dataset[["x"]], rank = true_L_rank, support.size = 50:70)
print(res)
coef(res)
plot(res, type = "tune")
plot(res, type = "loss")
plot(res, type = "S")