bess.one {BeSS} | R Documentation |
Best subset selection with a specified model size
Description
Best subset selection with a specified model size for generalized linear models and Cox's proportional hazard model.
Usage
bess.one(x, y, family = c("gaussian", "binomial", "cox"),
s = 1,
max.steps = 15,
glm.max = 1e6,
cox.max = 20,
factor = NULL,
weights = rep(1,nrow(x)),
normalize = TRUE)
Arguments
x |
Input matrix,of dimension n x p; each row is an observation vector. |
y |
Response variable, of length n. For family = " |
s |
Size of the selected model.It controls number of nonzero coefiicients to be allowed in the model. |
family |
One of the ditribution function for GLM or Cox models. Either " |
max.steps |
The maximum number of iterations in the primal dual active set algorithm. In most cases, only a few steps can gurantee the convergence. Default is 15. |
glm.max |
The maximum number of iterations for solving the maximum likelihood problem on the active set. It occurs at each step in the primal dual active set algorithm. Only used in the logistic regression for family = " |
cox.max |
The maximum number of iterations for solving the maximum partial likelihood problem on the active set. It occurs at each step in the primal dual active set algorithm. Only used in Cox model for family = " |
weights |
Observation weights. Default is 1 for each observation |
factor |
Which variable to be factored. Should be NULL or a numeric vector. |
normalize |
Whether to normalize |
Details
Given a model size s
, we consider the following best subset selection problem:
\min_\beta -2 logL(\beta) ;{ s.t.} \|\beta\|_0 = s.
In the GLM case, logL(\beta)
is the log-likelihood function; In the Cox model, logL(\beta)
is the log parital likelihood function.
The best subset selection problem is solved by the primal dual active set algorithm, see Wen et al. (2017) for details. This algorithm utilizes an active set updating strategy via primal and dual vairables and fits the sub-model by exploiting the fact that their support set are non-overlap and complementary.
Value
A list with class attribute 'bess.one
' and named components:
type |
Types of the model: " |
beta |
The best fitting coefficients with the smallest loss function given the model size |
lambda |
The estimated lambda value in the Lagrangian form of the best subset selection problem with model size |
bestmodel |
The best fitted model, the class of which is "lm", "glm" or "coxph" |
deviance |
The value of |
nulldeviance |
The value of |
factor |
Which variable to be factored. Should be NULL or a numeric vector. |
Author(s)
Canhong Wen, Aijun Zhang, Shijie Quan, and Xueqin Wang.
References
Wen, C., Zhang, A., Quan, S. and Wang, X. (2020). BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models, Journal of Statistical Software, Vol. 94(4). doi:10.18637/jss.v094.i04.
See Also
bess
, plot.bess
,
predict.bess
.
Examples
#--------------linear model--------------#
# Generate simulated data
n <- 500
p <- 20
K <-10
sigma <- 1
rho <- 0.2
data <- gen.data(n, p, family = "gaussian", K, rho, sigma)
# Best subset selection
fit1 <- bess.one(data$x, data$y, s = 10, family = "gaussian", normalize = TRUE)
#coef(fit1,sparse=TRUE)
bestmodel <- fit1$bestmodel
#summary(bestmodel)
## Not run:
#--------------logistic model--------------#
# Generate simulated data
data <- gen.data(n, p, family = "binomial", K, rho, sigma)
# Best subset selection
fit2 <- bess.one(data$x, data$y, family = "binomial", s = 10, normalize = TRUE)
bestmodel <- fit2$bestmodel
#summary(bestmodel)
#--------------cox model--------------#
# Generate simulated data
data <- gen.data(n, p, K, rho, sigma, c=10, family="cox", scal=10)
# Best subset selection
fit3 <- bess.one(data$x, data$y, s = 10, family = "cox", normalize = TRUE)
bestmodel <- fit3$bestmodel
#summary(bestmodel)
#----------------------High dimensional linear models--------------------#
p <- 1000
data <- gen.data(n, p, family = "gaussian", K, rho, sigma)
# Best subset selection
fit <- bess.one(data$x, data$y, s=10, family = "gaussian", normalize = TRUE)
#---------------prostate---------------#
data("prostate")
x = prostate[,-9]
y = prostate[,9]
fit.ungroup = bess.one(x, y, s=5)
fit.group = bess.one(x, y, s=5, factor = c("gleason"))
#---------------SAheart---------------#
data(SAheart)
y = SAheart[,5]
x = SAheart[,-5]
x$ldl[x$ldl<5] = 1
x$ldl[x$ldl>=5&x$ldl<10] = 2
x$ldl[x$ldl>=10] = 3
fit.ungroup = bess.one(x, y, s=5, family = "binomial")
fit.group = bess.one(x, y, s=5, factor = c("ldl"), family = "binomial")
## End(Not run)