R: Regularized regression incorporating external information

xtune {xtune}

R Documentation

Regularized regression incorporating external information

Description

xtune uses an Empirical Bayes approach to integrate external information into regularized regression models for both linear and categorical outcomes. It fits models with feature-specific penalty parameters based on external information.

Usage

xtune(
  X,
  Y,
  Z = NULL,
  U = NULL,
  family = c("linear", "binary", "multiclass"),
  c = 0.5,
  epsilon = 5,
  sigma.square = NULL,
  message = TRUE,
  control = list()
)

Arguments

`X`	Numeric design matrix of explanatory variables (`n` observations in rows, `p` predictors in columns). `xtune` includes an intercept by default.
`Y`	Outcome vector of dimension `n`.
`Z`	Numeric information matrix about the predictors (`p` rows, each corresponding to a predictor in X; `q` columns of external information about the predictors, such as prior biological importance). If Z is the grouping of predictors, it is best if user codes it as a dummy variable (i.e. each column indicating whether predictors belong to a specific group).
`U`	Covariates to be adjusted in the model (matrix with `n` observations in rows, `u` predictors in columns). Covariates are non-penalized in the model.
`family`	The family of the model according to different types of outcomes including "linear", "binary", and "multiclass".
`c`	The elastic-net mixing parameter ranging from 0 to 1. When `c` = 1, the model corresponds to Lasso. When `c` is set to 0, it corresponds to Ridge. For values between 0 and 1 (with a default of 0.5), the model corresponds to the elastic net.
`epsilon`	The parameter controls the boundary of the `alpha`. The maximum value that `alpha` could achieve equals to epsilon times of alpha max calculated by the pathwise coordinate descent. A larger value of epsilon indicates a stronger shrinkage effect (with a default of 5).
`sigma.square`	A user-supplied noise variance estimate. Typically, this is left unspecified, and the function automatically computes an estimated sigma square values using R package `selectiveinference`.
`message`	Generates diagnostic message in model fitting. Default is TRUE.
`control`	Specifies `xtune` control object. See `xtune.control` for more details.

Details

xtune has two main usages:

The basic usage of it is to choose the tuning parameter \lambda in elastic net regression using an Empirical Bayes approach, as an alternative to the widely-used cross-validation. This is done by calling xtune without specifying external information matrix Z.
More importantly, if an external information Z about the predictors X is provided, xtune can allow predictor-specific shrinkage parameters for regression coefficients in penalized regression models. The idea is that Z might be informative for the effect-size of regression coefficients, therefore we can guide the penalized regression model using Z.

Please note that the number of rows in Z should match with the number of columns in X. Since each column in Z is a feature about X. See here for more details on how to specify Z.

A majorization-minimization procedure is employed to fit xtune.

Value

An object with S3 class xtune containing:

`beta.est`	The fitted vector of coefficients.
`penalty.vector`	The estimated penalty vector applied to each regression coefficient. Similar to the `penalty.factor` argument in glmnet.
`lambda`	The estimated `\lambda` value. Note that the lambda value is calculated to reflect that the fact that penalty factors are internally rescaled to sum to nvars in glmnet. Similar to the `lambda` argument in glmnet.
`alpha.est`	The estimated second-level coefficient for prior covariate Z. The first value is the intercept of the second-level coefficient.
`n_iter`	Number of iterations used until convergence.
`method`	Same as in argument above
`sigma.square`	The estimated sigma square value using `estimateVariance`, if `sigma.square` is left unspecified. When `family` equals to "binary" or "multiclass", the `sigma.square` equals to NULL.
`family`	same as above
`likelihood.score`	A vector containing the marginal likelihood value of the fitted model at each iteration.

Author(s)

Jingxuan He and Chubing Zeng

Examples

## use simulated example data
set.seed(1234567)
data(example)
X <- example$X
Y <- example$Y
Z <- example$Z

## Empirical Bayes tuning to estimate tuning parameter, as an alternative to cross-validation:

fit.eb <- xtune(X=X,Y=Y, family = "linear")
fit.eb$lambda


### compare with tuning parameter chosen by cross-validation, using glmnet

fit.cv <- glmnet::cv.glmnet(x=X,y=Y,alpha = 0.5)
fit.cv$lambda.min


## Feature-specific penalties based on external information Z:

fit.diff <- xtune(X=X,Y=Y,Z=Z, family = "linear")
fit.diff$penalty.vector

[Package xtune version 2.0.0 Index]