bsrr {bestridge}R Documentation

Best subset ridge regression

Description

Best subset ridge regression for generalized linear model and Cox's proportional model.

Usage

bsrr(
  x,
  y,
  family = c("gaussian", "binomial", "poisson", "cox"),
  method = c("pgsection", "sequential", "psequential"),
  tune = c("gic", "ebic", "bic", "aic", "cv"),
  s.list,
  lambda.list = 0,
  s.min,
  s.max,
  lambda.min = 0.001,
  lambda.max = 100,
  nlambda = 100,
  always.include = NULL,
  screening.num = NULL,
  normalize = NULL,
  weight = NULL,
  max.iter = 20,
  warm.start = TRUE,
  nfolds = 5,
  group.index = NULL,
  seed = NULL
)

Arguments

x

Input matrix, of dimension n \times p; each row is an observation vector and each column is a predictor/feature/variable.

y

The response variable, of n observations. For family = "binomial" should be a factor with two levels. For family="poisson", y should be a vector with positive integer. For family = "cox", y should be a two-column matrix with columns named time and status.

family

One of the following models: "gaussian", "binomial", "poisson", or "cox". Depending on the response. Any unambiguous substring can be given.

method

The method to be used to select the optimal model size and L_2 shrinkage. For method = "sequential", we solve the best subset ridge regression problem for each s in 1,2,...,s.max and \lambda in lambda.list. For method = "pgsection" and "psequential", the Powell method is used to solve the best subset ridge regression problem. Any unambiguous substring can be given.

tune

The criterion for choosing the model size and L_2 shrinkage parameters. Available options are "gic", "ebic", "bic", "aic" and "cv". Default is "gic". "cv" is recommanded for BSRR.

s.list

An increasing list of sequential values representing the model sizes. Only used for method = "sequential". Default is 1:min(p, round(n/log(n))).

lambda.list

A lambda sequence for "bsrr". Only used for method = "sequential". Default is exp(seq(log(100), log(0.01), length.out = 100)).

s.min

The minimum value of model sizes. Only used for "psequential" and "pgsection". Default is 1.

s.max

The maximum value of model sizes. Only used for "psequential" and "pgsection". Default is min(p, round(n/log(n))).

lambda.min

The minimum value of lambda. Only used for method = "psequential" and "pgsection". Default is 0.001.

lambda.max

The maximum value of lambda. Only used for method = "psequential" and "pgsection". Default is 100.

nlambda

The number of \lambdas for the Powell path with sequential line search method. Only valid for method = "psequential".

always.include

An integer vector containing the indexes of variables that should always be included in the model.

screening.num

Users can pre-exclude some irrelevant variables according to maximum marginal likelihood estimators before fitting a model by passing an integer to screening.num and the sure independence screening will choose a set of variables of this size. Then the active set updates are restricted on this subset.

normalize

Options for normalization. normalize = 0 for no normalization. Setting normalize = 1 will only subtract the mean of columns of x. normalize = 2 for scaling the columns of x to have \sqrt n norm. normalize = 3 for subtracting the means of the columns of x and y, and also normalizing the columns of x to have \sqrt n norm. If normalize = NULL, by default, normalize will be set 1 for "gaussian", 2 for "binomial" and "poisson", 3 for "cox".

weight

Observation weights. Default is 1 for each observation.

max.iter

The maximum number of iterations in the bsrr function. In most of the case, only a few steps can guarantee the convergence. Default is 20.

warm.start

Whether to use the last solution as a warm start. Default is TRUE.

nfolds

The number of folds in cross-validation. Default is 5.

group.index

A vector of integers indicating the which group each variable is in. For variables in the same group, they should be located in adjacent columns of x and their corresponding index in group.index should be the same. Denote the first group as 1, the second 2, etc. If you do not fit a model with a group structure, please set group.index = NULL. Default is NULL.

seed

Seed to be used to divide the sample into K cross-validation folds. Default is NULL.

Details

The best ridge regression problem with model size s and the shrinkage parameter \lambda is

\min_\beta -2 \log L(\beta) + \lambda\Vert\beta\Vert_2^2 \;\;{\rm s.t.}\;\; \|\beta\|_0 \leq s.

In the GLM case, \log L(\beta) is the log likelihood function; In the Cox model, \log L(\beta) is the log partial likelihood function.

The best subset selection problem is a special case of the best ridge regression problem with the shrinkage \lambda=0.

For each candidate model size and \lambda, the best subset ridge regression problems are solved by the L_2 penalized primal-dual active set (PDAS) algorithm, see Wen et al (2020) for details. This algorithm utilizes an active set updating strategy via primal and dual variables and fits the sub-model by exploiting the fact that their support sets are non-overlap and complementary. For the case of method = "sequential" if warm.start = "TRUE", we run the PDAS algorithm for a list of sequential model sizes and use the estimate from the last iteration as a warm start. For the case of method = "psequential" and method = "pgsection", the Powell method using a sequential line search method or a golden section search technique is used for parameters determination.

Value

A list with class attribute 'bsrr' and named components:

beta

The best fitting coefficients.

coef0

The best fitting intercept.

loss

The training loss of the best fitting model.

ic

The information criterion of the best fitting model when model selection is based on a certain information criterion.

cvm

The mean cross-validated error for the best fitting model when model selection is based on the cross-validation.

lambda

The lambda chosen for the best fitting model

beta.all

For bsrr objects obtained by gsection, pgsection and psequential, beta.all is a matrix with each column be the coefficients of the model in each iterative step in the tuning path. For bsrr objects obtained by sequential method, A list of the best fitting coefficients of size s=0,1,...,p and \lambda in lambda.list with the smallest loss function. For "bsrr" objects of "bsrr" type, the fitting coefficients of the i^{th} \lambda and the j^{th} s are at the i^{th} list component's j^{th} column.

coef0.all

For bsrr objects obtained from gsection, pgsection and psequential, coef0.all contains the intercept for the model in each iterative step in the tuning path. For bsrr objects obtained from sequential path, coef0.all contains the best fitting intercepts of size s=0,1,\dots,p and \lambda in lambda.list with the smallest loss function.

loss.all

For bsrr objects obtained from gsection, pgsection and psequential, loss.all contains the training loss of the model in each iterative step in the tuning path. For bsrr objects obtained from sequential path, this is a list of the training loss of the best fitting intercepts of model size s=0,1,\dots,p and \lambda in lambda.list. For "bsrr" object obtained by "bsrr", the training loss of the i^{th} \lambda and the j^{th} s is at the i^{th} list component's j^{th} entry.

ic.all

For bsrr objects obtained from gsection, pgsection and psequential, ic.all contains the values of the chosen information criterion of the model in each iterative step in the tuning path. For bsrr objects obtained from sequential path, this is a matrix of the values of the chosen information criterion of model size s=0,1,\dots,p and \lambda in lambda.list with the smallest loss function. For "bsrr" object obtained by "bsrr", the training loss of the i^{th} \lambda and the j^{th} s is at the i^{th} row j^{th} column. Only available when model selection is based on a certain information criterion.

cvm.all

For bsrr objects obtained from gsection, pgsection and psequential, cvm.all contains the mean cross-validation error of the model in each iterative step in the tuning path. For bsrr objects obtained from sequential path, this is a matrix of the mean cross-validation error of model size s=0,1,\dots,p and \lambda in lambda.list with the smallest loss function. For "bsrr" object obtained by "bsrr", the training loss of the i^{th} \lambda and the j^{th} s is at the i^{th} row j^{th} column. Only available when model selection is based on the cross-validation.

lambda.all

The lambda chosen for each step in pgsection and psequential.

family

Type of the model.

s.list

The input s.list.

nsample

The sample size.

type

Either "bss" or "bsrr".

method

Method used for tuning parameters selection.

ic.type

The criterion of model selection.

Author(s)

Liyuan Hu, Kangkang Jiang, Yanhang Zhang, Jin Zhu, Canhong Wen and Xueqin Wang.

See Also

plot.bsrr, summary.bsrr, coef.bsrr, predict.bsrr.

Examples


#-------------------linear model----------------------#
# Generate simulated data
n <- 200
p <- 20
k <- 5
rho <- 0.4
seed <- 10
Tbeta <- rep(0, p)
Tbeta[1:k*floor(p/k):floor(p/k)] <- rep(1, k)
Data <- gen.data(n, p, k, rho, family = "gaussian", beta = Tbeta, seed = seed)
x <- Data$x[1:140, ]
y <- Data$y[1:140]
x_new <- Data$x[141:200, ]
y_new <- Data$y[141:200]
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
lm.bsrr <- bsrr(x, y, method = "pgsection")
coef(lm.bsrr)
print(lm.bsrr)
summary(lm.bsrr)
pred.bsrr <- predict(lm.bsrr, newx = x_new)

# generate plots
plot(lm.bsrr)
#-------------------logistic model----------------------#
#Generate simulated data
Data <- gen.data(n, p, k, rho, family = "binomial", beta = Tbeta, seed = seed)

x <- Data$x[1:140, ]
y <- Data$y[1:140]
x_new <- Data$x[141:200, ]
y_new <- Data$y[141:200]
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
logi.bsrr <- bsrr(x, y, family = "binomial", lambda.list = lambda.list)
coef(logi.bsrr)
print(logi.bsrr)
summary(logi.bsrr)
pred.bsrr <- predict(logi.bsrr, newx = x_new)

# generate plots
plot(logi.bsrr)
#-------------------poisson model----------------------#
Data <- gen.data(n, p, k, rho=0.3, family = "poisson", beta = Tbeta, seed = seed)

x <- Data$x[1:140, ]
y <- Data$y[1:140]
x_new <- Data$x[141:200, ]
y_new <- Data$y[141:200]
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
poi.bsrr <- bsrr(x, y, family = "poisson", lambda.list = lambda.list)
coef(poi.bsrr)
print(poi.bsrr)
summary(poi.bsrr)
pred.bsrr <- predict(poi.bsrr, newx = x_new)

# generate plots
plot(poi.bsrr)
#-------------------coxph model----------------------#
#Generate simulated data
Data <- gen.data(n, p, k, rho, family = "cox", scal = 10, beta = Tbeta)

x <- Data$x[1:140, ]
y <- Data$y[1:140, ]
x_new <- Data$x[141:200, ]
y_new <- Data$y[141:200, ]
lambda.list <- exp(seq(log(5), log(0.1), length.out = 10))
cox.bsrr <- bsrr(x, y, family = "cox", lambda.list = lambda.list)
coef(cox.bsrr)
print(cox.bsrr)
summary(cox.bsrr)
pred.bsrr <- predict(cox.bsrr, newx = x_new)

# generate plots
plot(cox.bsrr)

#----------------------High dimensional linear models--------------------#
## Not run: 
data <- gen.data(n, p = 1000, k, family = "gaussian", seed = seed)

# Best subset selection with SIS screening
lm.high <- bsrr(data$x, data$y, screening.num = 100)

## End(Not run)

#-------------------group selection----------------------#
beta <- rep(c(rep(1,2),rep(0,3)), 4)
Data <- gen.data(200, 20, 5, rho=0.4, beta = beta, seed =10)
x <- Data$x
y <- Data$y

group.index <- c(rep(1, 2), rep(2, 3), rep(3, 2), rep(4, 3),
                rep(5, 2), rep(6, 3), rep(7, 2), rep(8, 3))
lm.groupbsrr <- bsrr(x, y, s.min = 1, s.max = 8, group.index = group.index)
coef(lm.groupbsrr)
print(lm.groupbsrr)
summary(lm.groupbsrr)
pred.groupl0l2 <- predict(lm.groupbsrr, newx = x_new)
#-------------------include specified variables----------------------#
Data <- gen.data(n, p, k, rho, family = "gaussian", beta = Tbeta, seed = seed)
lm.bsrr <- bsrr(Data$x, Data$y, always.include = 2)


[Package bestridge version 1.0.7 Index]