R: Best subset ridge regression

bsrr {bestridge}

R Documentation

Best subset ridge regression

Description

Best subset ridge regression for generalized linear model and Cox's proportional model.

Usage

bsrr(
  x,
  y,
  family = c("gaussian", "binomial", "poisson", "cox"),
  method = c("pgsection", "sequential", "psequential"),
  tune = c("gic", "ebic", "bic", "aic", "cv"),
  s.list,
  lambda.list = 0,
  s.min,
  s.max,
  lambda.min = 0.001,
  lambda.max = 100,
  nlambda = 100,
  always.include = NULL,
  screening.num = NULL,
  normalize = NULL,
  weight = NULL,
  max.iter = 20,
  warm.start = TRUE,
  nfolds = 5,
  group.index = NULL,
  seed = NULL
)

Arguments

`x`	Input matrix, of dimension `n \times p`; each row is an observation vector and each column is a predictor/feature/variable.
`y`	The response variable, of `n` observations. For `family = "binomial"` should be a factor with two levels. For `family="poisson"`, `y` should be a vector with positive integer. For `family = "cox"`, `y` should be a two-column matrix with columns named `time` and `status`.
`family`	One of the following models: `"gaussian"`, `"binomial"`, `"poisson"`, or `"cox"`. Depending on the response. Any unambiguous substring can be given.
`method`	The method to be used to select the optimal model size and `L_2` shrinkage. For `method = "sequential"`, we solve the best subset ridge regression problem for each `s` in `1,2,...,s.max` and `\lambda` in `lambda.list`. For `method = "pgsection"` and `"psequential"`, the Powell method is used to solve the best subset ridge regression problem. Any unambiguous substring can be given.
`tune`	The criterion for choosing the model size and `L_2` shrinkage parameters. Available options are `"gic"`, `"ebic"`, `"bic"`, `"aic"` and `"cv"`. Default is `"gic"`. `"cv"` is recommanded for BSRR.
`s.list`	An increasing list of sequential values representing the model sizes. Only used for `method = "sequential"`. Default is `1:min(p, round(n/log(n)))`.
`lambda.list`	A lambda sequence for `"bsrr"`. Only used for `method = "sequential"`. Default is `exp(seq(log(100), log(0.01), length.out = 100))`.
`s.min`	The minimum value of model sizes. Only used for `"psequential"` and `"pgsection"`. Default is 1.
`s.max`	The maximum value of model sizes. Only used for `"psequential"` and `"pgsection"`. Default is `min(p, round(n/log(n)))`.
`lambda.min`	The minimum value of lambda. Only used for `method = "psequential"` and `"pgsection"`. Default is `0.001`.
`lambda.max`	The maximum value of lambda. Only used for `method = "psequential"` and `"pgsection"`. Default is `100`.
`nlambda`	The number of `\lambda`s for the Powell path with sequential line search method. Only valid for `method = "psequential"`.
`always.include`	An integer vector containing the indexes of variables that should always be included in the model.
`screening.num`	Users can pre-exclude some irrelevant variables according to maximum marginal likelihood estimators before fitting a model by passing an integer to `screening.num` and the sure independence screening will choose a set of variables of this size. Then the active set updates are restricted on this subset.
`normalize`	Options for normalization. `normalize = 0` for no normalization. Setting `normalize = 1` will only subtract the mean of columns of `x`. `normalize = 2` for scaling the columns of `x` to have `\sqrt n` norm. `normalize = 3` for subtracting the means of the columns of `x` and `y`, and also normalizing the columns of `x` to have `\sqrt n` norm. If `normalize = NULL`, by default, `normalize` will be set `1` for `"gaussian"`, `2` for `"binomial"` and `"poisson"`, `3` for `"cox"`.
`weight`	Observation weights. Default is `1` for each observation.
`max.iter`	The maximum number of iterations in the `bsrr` function. In most of the case, only a few steps can guarantee the convergence. Default is `20`.
`warm.start`	Whether to use the last solution as a warm start. Default is `TRUE`.
`nfolds`	The number of folds in cross-validation. Default is `5`.
`group.index`	A vector of integers indicating the which group each variable is in. For variables in the same group, they should be located in adjacent columns of `x` and their corresponding index in `group.index` should be the same. Denote the first group as `1`, the second `2`, etc. If you do not fit a model with a group structure, please set `group.index = NULL`. Default is `NULL`.
`seed`	Seed to be used to divide the sample into K cross-validation folds. Default is `NULL`.

Details

The best ridge regression problem with model size s and the shrinkage parameter \lambda is

\min_\beta -2 \log L(\beta) + \lambda\Vert\beta\Vert_2^2 \;\;{\rm s.t.}\;\; \|\beta\|_0 \leq s.

In the GLM case, \log L(\beta) is the log likelihood function; In the Cox model, \log L(\beta) is the log partial likelihood function.

The best subset selection problem is a special case of the best ridge regression problem with the shrinkage \lambda=0.

For each candidate model size and \lambda, the best subset ridge regression problems are solved by the L_2 penalized primal-dual active set (PDAS) algorithm, see Wen et al (2020) for details. This algorithm utilizes an active set updating strategy via primal and dual variables and fits the sub-model by exploiting the fact that their support sets are non-overlap and complementary. For the case of method = "sequential" if warm.start = "TRUE", we run the PDAS algorithm for a list of sequential model sizes and use the estimate from the last iteration as a warm start. For the case of method = "psequential" and method = "pgsection", the Powell method using a sequential line search method or a golden section search technique is used for parameters determination.

Value

A list with class attribute 'bsrr' and named components:

`beta`	The best fitting coefficients.
`coef0`	The best fitting intercept.
`loss`	The training loss of the best fitting model.
`ic`	The information criterion of the best fitting model when model selection is based on a certain information criterion.
`cvm`	The mean cross-validated error for the best fitting model when model selection is based on the cross-validation.
`lambda`	The lambda chosen for the best fitting model
`beta.all`	For `bsrr` objects obtained by `gsection`, `pgsection` and `psequential`, `beta.all` is a matrix with each column be the coefficients of the model in each iterative step in the tuning path. For `bsrr` objects obtained by `sequential` method, A list of the best fitting coefficients of size `s=0,1,...,p` and `\lambda` in `lambda.list` with the smallest loss function. For `"bsrr"` objects of `"bsrr"` type, the fitting coefficients of the `i^{th} \lambda` and the `j^{th}` `s` are at the `i^{th}` list component's `j^{th}` column.
`coef0.all`	For `bsrr` objects obtained from `gsection`, `pgsection` and `psequential`, `coef0.all` contains the intercept for the model in each iterative step in the tuning path. For `bsrr` objects obtained from `sequential` path, `coef0.all` contains the best fitting intercepts of size `s=0,1,\dots,p` and `\lambda` in `lambda.list` with the smallest loss function.
`loss.all`	For `bsrr` objects obtained from `gsection`, `pgsection` and `psequential`, `loss.all` contains the training loss of the model in each iterative step in the tuning path. For `bsrr` objects obtained from `sequential` path, this is a list of the training loss of the best fitting intercepts of model size `s=0,1,\dots,p` and `\lambda` in `lambda.list`. For `"bsrr"` object obtained by `"bsrr"`, the training loss of the `i^{th} \lambda` and the `j^{th}` `s` is at the `i^{th}` list component's `j^{th}` entry.
`ic.all`	For `bsrr` objects obtained from `gsection`, `pgsection` and `psequential`, `ic.all` contains the values of the chosen information criterion of the model in each iterative step in the tuning path. For `bsrr` objects obtained from `sequential` path, this is a matrix of the values of the chosen information criterion of model size `s=0,1,\dots,p` and `\lambda` in `lambda.list` with the smallest loss function. For `"bsrr"` object obtained by `"bsrr"`, the training loss of the `i^{th} \lambda` and the `j^{th}` `s` is at the `i^{th}` row `j^{th}` column. Only available when model selection is based on a certain information criterion.
`cvm.all`	For `bsrr` objects obtained from `gsection`, `pgsection` and `psequential`, `cvm.all` contains the mean cross-validation error of the model in each iterative step in the tuning path. For `bsrr` objects obtained from `sequential` path, this is a matrix of the mean cross-validation error of model size `s=0,1,\dots,p` and `\lambda` in `lambda.list` with the smallest loss function. For `"bsrr"` object obtained by `"bsrr"`, the training loss of the `i^{th} \lambda` and the `j^{th}` `s` is at the `i^{th}` row `j^{th}` column. Only available when model selection is based on the cross-validation.
`lambda.all`	The lambda chosen for each step in `pgsection` and `psequential`.
`family`	Type of the model.
`s.list`	The input `s.list`.
`nsample`	The sample size.
`type`	Either `"bss"` or `"bsrr"`.
`method`	Method used for tuning parameters selection.
`ic.type`	The criterion of model selection.

Author(s)

Liyuan Hu, Kangkang Jiang, Yanhang Zhang, Jin Zhu, Canhong Wen and Xueqin Wang.

Examples


#-------------------linear model----------------------#
# Generate simulated data
n <- 200
p <- 20
k <- 5
rho <- 0.4
seed <- 10
Tbeta <- rep(0, p)
Tbeta[1:k*floor(p/k):floor(p/k)] <- rep(1, k)
Data <- gen.data(n, p, k, rho, family = "gaussian", beta = Tbeta, seed = seed)
x <- Data$x[1:140, ]
y <- Data$y[1:140]
x_new <- Data$x[141:200, ]
y_new <- Data$y[141:200]
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
lm.bsrr <- bsrr(x, y, method = "pgsection")
coef(lm.bsrr)
print(lm.bsrr)
summary(lm.bsrr)
pred.bsrr <- predict(lm.bsrr, newx = x_new)

# generate plots
plot(lm.bsrr)
#-------------------logistic model----------------------#
#Generate simulated data
Data <- gen.data(n, p, k, rho, family = "binomial", beta = Tbeta, seed = seed)

x <- Data$x[1:140, ]
y <- Data$y[1:140]
x_new <- Data$x[141:200, ]
y_new <- Data$y[141:200]
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
logi.bsrr <- bsrr(x, y, family = "binomial", lambda.list = lambda.list)
coef(logi.bsrr)
print(logi.bsrr)
summary(logi.bsrr)
pred.bsrr <- predict(logi.bsrr, newx = x_new)

# generate plots
plot(logi.bsrr)
#-------------------poisson model----------------------#
Data <- gen.data(n, p, k, rho=0.3, family = "poisson", beta = Tbeta, seed = seed)

x <- Data$x[1:140, ]
y <- Data$y[1:140]
x_new <- Data$x[141:200, ]
y_new <- Data$y[141:200]
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
poi.bsrr <- bsrr(x, y, family = "poisson", lambda.list = lambda.list)
coef(poi.bsrr)
print(poi.bsrr)
summary(poi.bsrr)
pred.bsrr <- predict(poi.bsrr, newx = x_new)

# generate plots
plot(poi.bsrr)
#-------------------coxph model----------------------#
#Generate simulated data
Data <- gen.data(n, p, k, rho, family = "cox", scal = 10, beta = Tbeta)

x <- Data$x[1:140, ]
y <- Data$y[1:140, ]
x_new <- Data$x[141:200, ]
y_new <- Data$y[141:200, ]
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
cox.bsrr <- bsrr(x, y, family = "cox", lambda.list = lambda.list)
coef(cox.bsrr)
print(cox.bsrr)
summary(cox.bsrr)
pred.bsrr <- predict(cox.bsrr, newx = x_new)

# generate plots
plot(cox.bsrr)

#----------------------High dimensional linear models--------------------#
## Not run: 
data <- gen.data(n, p = 1000, k, family = "gaussian", seed = seed)

# Best subset selection with SIS screening
lm.high <- bsrr(data$x, data$y, screening.num = 100)

## End(Not run)

#-------------------group selection----------------------#
beta <- rep(c(rep(1,2),rep(0,3)), 4)
Data <- gen.data(200, 20, 5, rho=0.4, beta = beta, seed =10)
x <- Data$x
y <- Data$y

group.index <- c(rep(1, 2), rep(2, 3), rep(3, 2), rep(4, 3),
                rep(5, 2), rep(6, 3), rep(7, 2), rep(8, 3))
lm.groupbsrr <- bsrr(x, y, s.min = 1, s.max = 8, group.index = group.index)
coef(lm.groupbsrr)
print(lm.groupbsrr)
summary(lm.groupbsrr)
pred.groupl0l2 <- predict(lm.groupbsrr, newx = x_new)
#-------------------include specified variables----------------------#
Data <- gen.data(n, p, k, rho, family = "gaussian", beta = Tbeta, seed = seed)
lm.bsrr <- bsrr(Data$x, Data$y, always.include = 2)

[Package bestridge version 1.0.7 Index]