abessrpca {abess}  R Documentation 
Decompose a matrix into the summation of lowrank matrix and sparse matrix via the best subset selection approach
abessrpca(
x,
rank,
support.size = NULL,
tune.path = c("sequence", "gsection"),
gs.range = NULL,
tune.type = c("gic", "aic", "bic", "ebic"),
ic.scale = 1,
lambda = 0,
always.include = NULL,
group.index = NULL,
c.max = NULL,
splicing.type = 2,
max.splicing.iter = 1,
warm.start = TRUE,
important.search = NULL,
max.newton.iter = 1,
newton.thresh = 0.001,
num.threads = 0,
seed = 1,
...
)
x 
A matrix object. 
rank 
A positive integer value specify the rank of the lowrank matrix. 
support.size 
An integer vector representing the alternative support sizes.
Only used for 
tune.path 
The method to be used to select the optimal support size. For

gs.range 
A integer vector with two elements.
The first element is the minimum model size considered by goldensection,
the later one is the maximum one. Default is 
tune.type 
The type of criterion for choosing the support size. Available options are "gic", "ebic", "bic" and "aic". Default is "gic". 
ic.scale 
A nonnegative value used for multiplying the penalty term
in information criterion. Default: 
lambda 
A single lambda value for regularized best subset selection. Default is 0. 
always.include 
An integer vector containing the indexes of variables that should always be included in the model. 
group.index 
A vector of integers indicating the which group each variable is in.
For variables in the same group, they should be located in adjacent columns of 
c.max 
an integer splicing size. Default is: 
splicing.type 
Optional type for splicing.
If 
max.splicing.iter 
The maximum number of performing splicing algorithm.
In most of the case, only a few times of splicing iteration can guarantee the convergence.
Default is 
warm.start 
Whether to use the last solution as a warm start. Default is 
important.search 
An integer number indicating the number of
important variables to be splicing.
When 
max.newton.iter 
a integer giving the maximal number of Newton's iteration iterations.
Default is 
newton.thresh 
a numeric value for controlling positive convergence tolerance.
The Newton's iterations converge when 
num.threads 
An integer decide the number of threads to be
concurrently used for crossvalidation (i.e., 
seed 
Seed to be used to divide the sample into crossvalidation folds.
Default is 
... 
further arguments to be passed to or from methods. 
Adaptive best subset selection for robust principal component analysis aim to find two latent matrices L
and S
such that the original matrix X
can be appropriately approximated:
x = L + S + N,
where L
is a lowrank matrix, S
is a sparse matrix, N
is a dense noise matrix.
Generic splicing technique can be employed to solve this problem by iteratively improve the quality of the estimation of S
.
For a given support set \Omega
, the optimization problem:
\min_S \ x  L  S\_F^2 \;\;{\rm s.t.}\;\; S_{ij} = 0 {\rm for } (i, j) \in \Omega^c,
still a nonconvex optimization problem. We use the hardimpute algorithm proposed in one of the reference to solve this problem.
The hardimpute algorithm is an iterative algorithm, people can set max.newton.iter
and newton.thresh
to
control the solution precision of the optimization problem.
(Here, the name of the two parameters are somehow abused to make the parameters cross functions have an unified name.)
According to our experiments,
we assign properly parameters to the two parameter as the default such that the precision and runtime are well balanced,
we suggest users keep the default values unchanged.
A S3 abessrpca
class object, which is a list
with the following components:
S 
A list with 
L 
The low rank matrix estimation. 
nobs 
The number of sample used for training. 
nvars 
The number of variables used for training. 
rank 
The rank of matrix 
loss 
The loss of objective function. 
tune.value 
A value of tuning criterion of length 
support.size 
The actual support.size values used. Note that it is not necessary the same as the input if the later have noninteger values or duplicated values. 
tune.type 
The criterion type for tuning parameters. 
call 
The original call to 
Some parameters not described in the Details Section is explained in the document for abess
because the meaning of these parameters are very similar.
At present, l_2
regularization and group selection are not support,
and thus, set lambda
and group.index
have no influence on the output.
This feature will coming soon.
A polynomial algorithm for bestsubset selection problem. Junxian Zhu, Canhong Wen, Jin Zhu, Heping Zhang, Xueqin Wang. Proceedings of the National Academy of Sciences Dec 2020, 117 (52) 3311733123; doi: 10.1073/pnas.2014241117
Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust principal component analysis? Journal of the ACM. 58, 3, Article 11 (May 2011), 37 pages. doi: 10.1145/1970392.1970395
Mazumder, Rahul, Trevor Hastie, and Robert Tibshirani. Spectral regularization algorithms for learning large incomplete matrices. The Journal of Machine Learning Research 11 (2010): 22872322.
library(abess)
n < 30
p < 30
true_S_size < 60
true_L_rank < 2
dataset < generate.matrix(n, p, support.size = true_S_size, rank = true_L_rank)
res < abessrpca(dataset[["x"]], rank = true_L_rank, support.size = 50:70)
print(res)
coef(res)
plot(res, type = "tune")
plot(res, type = "loss")
plot(res, type = "S")