abess.default {abess}  R Documentation 
Adaptive bestsubset selection for regression, (multiclass) classification, countingresponse, censoredresponse, multiresponse modeling in polynomial times.
## Default S3 method: abess( x, y, family = c("gaussian", "binomial", "poisson", "cox", "mgaussian", "multinomial"), tune.path = c("sequence", "gsection"), tune.type = c("gic", "ebic", "bic", "aic", "cv"), weight = NULL, normalize = NULL, c.max = 2, support.size = NULL, gs.range = NULL, lambda = 0, always.include = NULL, group.index = NULL, splicing.type = 2, max.splicing.iter = 20, screening.num = NULL, important.search = NULL, warm.start = TRUE, nfolds = 5, cov.update = FALSE, newton = c("exact", "approx"), newton.thresh = 1e06, max.newton.iter = NULL, early.stop = FALSE, num.threads = 0, seed = 1, ... ) ## S3 method for class 'formula' abess(formula, data, subset, na.action, ...)
x 
Input matrix, of dimension n \times p; each row is an observation
vector and each column is a predictor/feature/variable.
Can be in sparse matrix format (inherit from class 
y 
The response variable, of 
family 
One of the following models:

tune.path 
The method to be used to select the optimal support size. For

tune.type 
The type of criterion for choosing the support size.
Available options are 
weight 
Observation weights. When 
normalize 
Options for normalization. 
c.max 
an integer splicing size. Default is: 
support.size 
An integer vector representing the alternative support sizes.
Only used for 
gs.range 
A integer vector with two elements.
The first element is the minimum model size considered by goldensection,
the later one is the maximum one. Default is 
lambda 
A single lambda value for regularized best subset selection. Default is 0. 
always.include 
An integer vector containing the indexes of variables that should always be included in the model. 
group.index 
A vector of integers indicating the which group each variable is in.
For variables in the same group, they should be located in adjacent columns of 
splicing.type 
Optional type for splicing.
If 
max.splicing.iter 
The maximum number of performing splicing algorithm.
In most of the case, only a few times of splicing iteration can guarantee the convergence.
Default is 
screening.num 
An integer number. Preserve 
important.search 
An integer number indicating the number of
important variables to be splicing.
When 
warm.start 
Whether to use the last solution as a warm start. Default is 
nfolds 
The number of folds in crossvalidation. Default is 
cov.update 
A logical value only used for 
newton 
A character specify the Newton's method for fitting generalized linear models,
it should be either 
newton.thresh 
a numeric value for controlling positive convergence tolerance.
The Newton's iterations converge when dev  dev_{old}/(dev + 0.1)< 
max.newton.iter 
a integer giving the maximal number of Newton's iteration iterations.
Default is 
early.stop 
A boolean value decide whether early stopping.
If 
num.threads 
An integer decide the number of threads to be
concurrently used for crossvalidation (i.e., 
seed 
Seed to be used to divide the sample into crossvalidation folds.
Default is 
... 
further arguments to be passed to or from methods. 
formula 
an object of class " 
data 
a data frame containing the variables in the 
subset 
an optional vector specifying a subset of observations to be used. 
na.action 
a function which indicates
what should happen when the data contain 
Bestsubset selection aims to find a small subset of predictors, so that the resulting model is expected to have the most desirable prediction accuracy. Bestsubset selection problem under the support size s is
\min_β 2 \log L(β) \;\;{\rm s.t.}\;\; \β\_0 ≤q s,
where L(β) is arbitrary convex functions. In the GLM case, \log L(β) is the loglikelihood function; in the Cox model, \log L(β) is the log partiallikelihood function.
The best subset selection problem is solved by the "abess" algorithm in this package, see Zhu (2020) for details. Under mild conditions, the algorithm exactly solve this problem in polynomial time. This algorithm exploits the idea of sequencing and splicing to reach a stable solution in finite steps when s is fixed. To find the optimal support size s, we provide various criterion like GIC, AIC, BIC and crossvalidation error to determine it.
A S3 abess
class object, which is a list
with the following components:
beta 
A pby 
intercept 
An intercept vector of length 
dev 
the deviance of length 
tune.value 
A value of tuning criterion of length 
nobs 
The number of sample used for training. 
nvars 
The number of variables used for training. 
family 
Type of the model. 
tune.path 
The path type for tuning parameters. 
support.size 
The actual 
edf 
The effective degree of freedom.
It is the same as 
best.size 
The best support size selected by the tuning value. 
tune.type 
The criterion type for tuning parameters. 
tune.path 
The strategy for tuning parameters. 
screening.vars 
The character vector specify the feature
selected by feature screening.
It would be an empty character vector if 
call 
The original call to 
Jin Zhu, Junxian Zhu, Canhong Wen, Heping Zhang, Xueqin Wang
A polynomial algorithm for bestsubset selection problem. Junxian Zhu, Canhong Wen, Jin Zhu, Heping Zhang, Xueqin Wang. Proceedings of the National Academy of Sciences Dec 2020, 117 (52) 3311733123; DOI: 10.1073/pnas.2014241117
Sure independence screening for ultrahigh dimensional feature space. Fan, J. and Lv, J. (2008), Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70: 849911. https://doi.org/10.1111/j.14679868.2008.00674.x
Targeted Inference Involving HighDimensional Data Using Nuisance Penalized Regression. Qiang Sun & Heping Zhang (2020). Journal of the American Statistical Association, DOI: 10.1080/01621459.2020.1737079
Certifiably Polynomial Algorithm for Best Group Subset Selection. Zhang, Yanhang, Junxian Zhu, Jin Zhu, and Xueqin Wang (2021). arXiv preprint arXiv:2104.12576.
print.abess
,
predict.abess
,
coef.abess
,
extract.abess
,
plot.abess
,
deviance.abess
.
library(abess) n < 100 p < 20 support.size < 3 ################ linear model ################ dataset < generate.data(n, p, support.size) abess_fit < abess(dataset[["x"]], dataset[["y"]]) ## helpful generic functions: print(abess_fit) coef(abess_fit, support.size = 3) predict(abess_fit, newx = dataset[["x"]][1:10, ], support.size = c(3, 4) ) str(extract(abess_fit, 3)) deviance(abess_fit) plot(abess_fit) ################ logistic model ################ dataset < generate.data(n, p, support.size, family = "binomial") ## allow crossvalidation to tuning abess_fit < abess(dataset[["x"]], dataset[["y"]], family = "binomial", tune.type = "cv" ) abess_fit ################ poisson model ################ dataset < generate.data(n, p, support.size, family = "poisson") abess_fit < abess(dataset[["x"]], dataset[["y"]], family = "poisson", tune.type = "cv" ) abess_fit ################ Cox model ################ dataset < generate.data(n, p, support.size, family = "cox") abess_fit < abess(dataset[["x"]], dataset[["y"]], family = "cox", tune.type = "cv" ) ################ Multivariate gaussian model ################ dataset < generate.data(n, p, support.size, family = "mgaussian") abess_fit < abess(dataset[["x"]], dataset[["y"]], family = "mgaussian", tune.type = "cv" ) plot(abess_fit, type = "l2norm") ################ Multinomial model (multiclassification) ################ dataset < generate.data(n, p, support.size, family = "multinomial") abess_fit < abess(dataset[["x"]], dataset[["y"]], family = "multinomial", tune.type = "cv" ) predict(abess_fit, newx = dataset[["x"]][1:10, ], support.size = c(3, 4), type = "response" ) ########## Best group subset selection ############# dataset < generate.data(n, p, support.size) group_index < rep(1:10, each = 2) abess_fit < abess(dataset[["x"]], dataset[["y"]], group.index = group_index) str(extract(abess_fit)) ################ Golden section searching ################ dataset < generate.data(n, p, support.size) abess_fit < abess(dataset[["x"]], dataset[["y"]], tune.path = "gsection") abess_fit ################ Feature screening ################ p < 1000 dataset < generate.data(n, p, support.size) abess_fit < abess(dataset[["x"]], dataset[["y"]], screening.num = 100 ) str(extract(abess_fit)) ################ Sparse predictor ################ require(Matrix) p < 1000 dataset < generate.data(n, p, support.size) dataset[["x"]][abs(dataset[["x"]]) < 1] < 0 dataset[["x"]] < Matrix(dataset[["x"]]) abess_fit < abess(dataset[["x"]], dataset[["y"]]) str(extract(abess_fit)) ################ Formula interface ################ data("trim32") abess_fit < abess(y ~ ., data = trim32) abess_fit