bsrr {bestridge} R Documentation

## Best subset ridge regression

### Description

Best subset ridge regression for generalized linear model and Cox's proportional model.

### Usage

bsrr(
x,
y,
family = c("gaussian", "binomial", "poisson", "cox"),
method = c("pgsection", "sequential", "psequential"),
tune = c("gic", "ebic", "bic", "aic", "cv"),
s.list,
lambda.list = 0,
s.min,
s.max,
lambda.min = 0.001,
lambda.max = 100,
nlambda = 100,
always.include = NULL,
screening.num = NULL,
normalize = NULL,
weight = NULL,
max.iter = 20,
warm.start = TRUE,
nfolds = 5,
group.index = NULL,
seed = NULL
)


### Arguments

 x Input matrix, of dimension n \times p; each row is an observation vector and each column is a predictor/feature/variable. y The response variable, of n observations. For family = "binomial" should be a factor with two levels. For family="poisson", y should be a vector with positive integer. For family = "cox", y should be a two-column matrix with columns named time and status. family One of the following models: "gaussian", "binomial", "poisson", or "cox". Depending on the response. Any unambiguous substring can be given. method The method to be used to select the optimal model size and L_2 shrinkage. For method = "sequential", we solve the best subset ridge regression problem for each s in 1,2,...,s.max and λ in lambda.list. For method = "pgsection" and "psequential", the Powell method is used to solve the best subset ridge regression problem. Any unambiguous substring can be given. tune The criterion for choosing the model size and L_2 shrinkage parameters. Available options are "gic", "ebic", "bic", "aic" and "cv". Default is "gic". "cv" is recommanded for BSRR. s.list An increasing list of sequential values representing the model sizes. Only used for method = "sequential". Default is 1:min(p, round(n/log(n))). lambda.list A lambda sequence for "bsrr". Only used for method = "sequential". Default is exp(seq(log(100), log(0.01), length.out = 100)). s.min The minimum value of model sizes. Only used for "psequential" and "pgsection". Default is 1. s.max The maximum value of model sizes. Only used for "psequential" and "pgsection". Default is min(p, round(n/log(n))). lambda.min The minimum value of lambda. Only used for method = "psequential" and "pgsection". Default is 0.001. lambda.max The maximum value of lambda. Only used for method = "psequential" and "pgsection". Default is 100. nlambda The number of λs for the Powell path with sequential line search method. Only valid for method = "psequential". always.include An integer vector containing the indexes of variables that should always be included in the model. screening.num Users can pre-exclude some irrelevant variables according to maximum marginal likelihood estimators before fitting a model by passing an integer to screening.num and the sure independence screening will choose a set of variables of this size. Then the active set updates are restricted on this subset. normalize Options for normalization. normalize = 0 for no normalization. Setting normalize = 1 will only subtract the mean of columns of x. normalize = 2 for scaling the columns of x to have √ n norm. normalize = 3 for subtracting the means of the columns of x and y, and also normalizing the columns of x to have √ n norm. If normalize = NULL, by default, normalize will be set 1 for "gaussian", 2 for "binomial" and "poisson", 3 for "cox". weight Observation weights. Default is 1 for each observation. max.iter The maximum number of iterations in the bsrr function. In most of the case, only a few steps can guarantee the convergence. Default is 20. warm.start Whether to use the last solution as a warm start. Default is TRUE. nfolds The number of folds in cross-validation. Default is 5. group.index A vector of integers indicating the which group each variable is in. For variables in the same group, they should be located in adjacent columns of x and their corresponding index in group.index should be the same. Denote the first group as 1, the second 2, etc. If you do not fit a model with a group structure, please set group.index = NULL. Default is NULL. seed Seed to be used to divide the sample into K cross-validation folds. Default is NULL.

### Details

The best ridge regression problem with model size s and the shrinkage parameter λ is

\min_β -2 \log L(β) + λ\Vertβ\Vert_2^2 \;\;{\rm s.t.}\;\; \|β\|_0 ≤q s.

In the GLM case, \log L(β) is the log likelihood function; In the Cox model, \log L(β) is the log partial likelihood function.

The best subset selection problem is a special case of the best ridge regression problem with the shrinkage λ=0.

For each candidate model size and λ, the best subset ridge regression problems are solved by the L_2 penalized primal-dual active set (PDAS) algorithm, see Wen et al (2020) for details. This algorithm utilizes an active set updating strategy via primal and dual variables and fits the sub-model by exploiting the fact that their support sets are non-overlap and complementary. For the case of method = "sequential" if warm.start = "TRUE", we run the PDAS algorithm for a list of sequential model sizes and use the estimate from the last iteration as a warm start. For the case of  method = "psequential" and method = "pgsection", the Powell method using a sequential line search method or a golden section search technique is used for parameters determination.

### Value

A list with class attribute 'bsrr' and named components:

 beta The best fitting coefficients. coef0 The best fitting intercept. loss The training loss of the best fitting model. ic The information criterion of the best fitting model when model selection is based on a certain information criterion. cvm The mean cross-validated error for the best fitting model when model selection is based on the cross-validation. lambda The lambda chosen for the best fitting model beta.all For bsrr objects obtained by gsection, pgsection and psequential, beta.all is a matrix with each column be the coefficients of the model in each iterative step in the tuning path. For bsrr objects obtained by sequential method, A list of the best fitting coefficients of size s=0,1,...,p and λ in lambda.list with the smallest loss function. For "bsrr" objects of "bsrr" type, the fitting coefficients of the i^{th} λ and the j^{th} s are at the i^{th} list component's j^{th} column. coef0.all For bsrr objects obtained from gsection, pgsection and psequential, coef0.all contains the intercept for the model in each iterative step in the tuning path. For bsrr objects obtained from sequential path, coef0.all contains the best fitting intercepts of size s=0,1,…,p and λ in lambda.list with the smallest loss function. loss.all For bsrr objects obtained from gsection, pgsection and psequential, loss.all contains the training loss of the model in each iterative step in the tuning path. For bsrr objects obtained from sequential path, this is a list of the training loss of the best fitting intercepts of model size s=0,1,…,p and λ in lambda.list. For "bsrr" object obtained by "bsrr", the training loss of the i^{th} λ and the j^{th} s is at the i^{th} list component's j^{th} entry. ic.all For bsrr objects obtained from gsection, pgsection and psequential, ic.all contains the values of the chosen information criterion of the model in each iterative step in the tuning path. For bsrr objects obtained from sequential path, this is a matrix of the values of the chosen information criterion of model size s=0,1,…,p and λ in lambda.list with the smallest loss function. For "bsrr" object obtained by "bsrr", the training loss of the i^{th} λ and the j^{th} s is at the i^{th} row j^{th} column. Only available when model selection is based on a certain information criterion. cvm.all For bsrr objects obtained from gsection, pgsection and psequential, cvm.all contains the mean cross-validation error of the model in each iterative step in the tuning path. For bsrr objects obtained from sequential path, this is a matrix of the mean cross-validation error of model size s=0,1,…,p and λ in lambda.list with the smallest loss function. For "bsrr" object obtained by "bsrr", the training loss of the i^{th} λ and the j^{th} s is at the i^{th} row j^{th} column. Only available when model selection is based on the cross-validation. lambda.all The lambda chosen for each step in pgsection and psequential. family Type of the model. s.list The input s.list. nsample The sample size. type Either "bss" or "bsrr". method Method used for tuning parameters selection. ic.type The criterion of model selection.

### Author(s)

Liyuan Hu, Kangkang Jiang, Yanhang Zhang, Jin Zhu, Canhong Wen and Xueqin Wang.

### See Also

plot.bsrr, summary.bsrr, coef.bsrr, predict.bsrr.

### Examples


#-------------------linear model----------------------#
# Generate simulated data
n <- 200
p <- 20
k <- 5
rho <- 0.4
seed <- 10
Tbeta <- rep(0, p)
Tbeta[1:k*floor(p/k):floor(p/k)] <- rep(1, k)
Data <- gen.data(n, p, k, rho, family = "gaussian", beta = Tbeta, seed = seed)
x <- Data$x[1:140, ] y <- Data$y[1:140]
x_new <- Data$x[141:200, ] y_new <- Data$y[141:200]
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
lm.bsrr <- bsrr(x, y, method = "pgsection")
coef(lm.bsrr)
print(lm.bsrr)
summary(lm.bsrr)
pred.bsrr <- predict(lm.bsrr, newx = x_new)

# generate plots
plot(lm.bsrr)
#-------------------logistic model----------------------#
#Generate simulated data
Data <- gen.data(n, p, k, rho, family = "binomial", beta = Tbeta, seed = seed)

x <- Data$x[1:140, ] y <- Data$y[1:140]
x_new <- Data$x[141:200, ] y_new <- Data$y[141:200]
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
logi.bsrr <- bsrr(x, y, family = "binomial", lambda.list = lambda.list)
coef(logi.bsrr)
print(logi.bsrr)
summary(logi.bsrr)
pred.bsrr <- predict(logi.bsrr, newx = x_new)

# generate plots
plot(logi.bsrr)
#-------------------poisson model----------------------#
Data <- gen.data(n, p, k, rho=0.3, family = "poisson", beta = Tbeta, seed = seed)

x <- Data$x[1:140, ] y <- Data$y[1:140]
x_new <- Data$x[141:200, ] y_new <- Data$y[141:200]
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
poi.bsrr <- bsrr(x, y, family = "poisson", lambda.list = lambda.list)
coef(poi.bsrr)
print(poi.bsrr)
summary(poi.bsrr)
pred.bsrr <- predict(poi.bsrr, newx = x_new)

# generate plots
plot(poi.bsrr)
#-------------------coxph model----------------------#
#Generate simulated data
Data <- gen.data(n, p, k, rho, family = "cox", scal = 10, beta = Tbeta)

x <- Data$x[1:140, ] y <- Data$y[1:140, ]
x_new <- Data$x[141:200, ] y_new <- Data$y[141:200, ]
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
cox.bsrr <- bsrr(x, y, family = "cox", lambda.list = lambda.list)
coef(cox.bsrr)
print(cox.bsrr)
summary(cox.bsrr)
pred.bsrr <- predict(cox.bsrr, newx = x_new)

# generate plots
plot(cox.bsrr)

#----------------------High dimensional linear models--------------------#
## Not run:
data <- gen.data(n, p = 1000, k, family = "gaussian", seed = seed)

# Best subset selection with SIS screening
lm.high <- bsrr(data$x, data$y, screening.num = 100)

## End(Not run)

#-------------------group selection----------------------#
beta <- rep(c(rep(1,2),rep(0,3)), 4)
Data <- gen.data(200, 20, 5, rho=0.4, beta = beta, seed =10)
x <- Data$x y <- Data$y

group.index <- c(rep(1, 2), rep(2, 3), rep(3, 2), rep(4, 3),
rep(5, 2), rep(6, 3), rep(7, 2), rep(8, 3))
lm.groupbsrr <- bsrr(x, y, s.min = 1, s.max = 8, group.index = group.index)
coef(lm.groupbsrr)
print(lm.groupbsrr)
summary(lm.groupbsrr)
pred.groupl0l2 <- predict(lm.groupbsrr, newx = x_new)
#-------------------include specified variables----------------------#
Data <- gen.data(n, p, k, rho, family = "gaussian", beta = Tbeta, seed = seed)
lm.bsrr <- bsrr(Data$x, Data$y, always.include = 2)



[Package bestridge version 1.0.5 Index]