R: Fit a sparse group regularized regression path

sgp {SGPR}

R Documentation

Fit a sparse group regularized regression path

Description

A function that determines the regularization paths for models with sparse group penalties at a grid of values for the regularization parameter lambda.

Usage

sgp(
  X,
  y,
  group = 1:ncol(X),
  penalty = c("sgl", "sgs", "sgm", "sge"),
  alpha = 1/3,
  type = c("linear", "logit"),
  Z = NULL,
  nlambda = 100,
  lambda.min = {
     if (nrow(X) > ncol(X)) 
         1e-04
     else 0.05
 },
  log.lambda = TRUE,
  lambdas,
  prec = 1e-04,
  ada_mult = 2,
  max.iter = 10000,
  standardize = TRUE,
  vargamma = ifelse(pvar == "scad" | penalty == "sgs", 4, 3),
  grgamma = ifelse(pgr == "scad" | penalty == "sgs", 4, 3),
  vartau = 1,
  grtau = 1,
  pvar = c("lasso", "scad", "mcp", "exp"),
  pgr = c("lasso", "scad", "mcp", "exp"),
  group.weight = rep(1, length(unique(group))),
  returnX = FALSE,
  ...
)

Arguments

`X`	The design matrix without intercept with the variables to be selected.
`y`	The response vector.
`group`	A vector indicating the group membership of each variable in X.
`penalty`	A string that specifies the sparse group penalty to be used.
`alpha`	Tuning parameter for the mixture of penalties at group and variable level. A value of 0 results in a selection at group level, a value of 1 results in a selection at variable level and everything in between is bi-level selection.
`type`	A string indicating the type of regression model (linear or binomial).
`Z`	The design matrix of the variables to be included in the model without penalization.
`nlambda`	An integer that specifies the length of the lambda sequence.
`lambda.min`	An integer multiplied by the maximum lambda to define the end of the lambda sequence.
`log.lambda`	A Boolean value that specifies whether the values of the lambda sequence should be on the log scale.
`lambdas`	A user supplied vector with values for lambda.
`prec`	The convergence threshold for the algorithm.
`ada_mult`	An integer that defines the multiplier for adjusting the convergence threshold.
`max.iter`	The convergence threshold for the algorithm.
`standardize`	An integer that defines the multiplier for adjusting the convergence threshold.
`vargamma`	An integer that defines the value of gamma for the penalty at the variable level.
`grgamma`	An integer that specifies the value of gamma for the penalty at the group level.
`vartau`	An integer that defines the value of tau for the penalty at the variable level.
`grtau`	An integer that specifies the value of tau for the penalty at the group level.
`pvar`	A string that specifies the penalty used at the variable level.
`pgr`	A string that specifies the penalty used at the group level.
`group.weight`	A vector specifying weights that are multiplied by the group penalty to account for different group sizes.
`returnX`	A Boolean value that specifies whether standardized design matrix should be returned.
`...`	Other parameters of underlying basic functions.

Details

Two options are available for choosing a penalty. With the argument penalty, the methods Sparse Group LASSO, Sparse Group SCAD, Sparse Group MCP and Sparse Group EP can be selected with the abbreviations sgl, sgs, sgm and sge. Alternatively, penalties can be combined additively with the arguments pvar and pgr, where pvar is the penalty applied at the variable level and pgr is the penalty applied at the group level. The options are lasso, scad, mcp and exp for Least Absolute Shrinkage and Selection Operator, Smoothly Clipped Absolute Deviation, Minimax Concave Penalty and Exponential Penalty.

Value

A list containing:

beta: A vector with estimated coefficients.
type: A string indicating the type of regression model (linear or binomial).
group: A vector indicating the group membership of the individual variables in X.
lambdas: The sequence of lambda values.
alpha: Tuning parameter for the mixture of penalties at group and variable level.
loss: A vector containing either the residual sum of squares (linear) or the negative log-likelihood (binomial).
prec: The convergence threshold used for each lambda.
n: Number of observations.
penalty: A string indicating the sparse group penalty used.
df: A vector of pseudo degrees of freedom for each lambda.
iter: A vector of the number of iterations for each lambda.
group.weight: A vector of weights multiplied by the group penalty.
y: The response vector.
X: The design matrix without intercept.

References

Buch, G., Schulz, A., Schmidtmann, I., Strauch, K., and Wild, P. S. (2024) Sparse Group Penalties for bi-level variable selection. Biometrical Journal, 66, 2200334. doi:10.1002/bimj.202200334
Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2011) A Sparse-Group Lasso. Journal of computational and graphical statistics, 22(2), 231-245. doi:10.1080/10618600.2012.681250
Breheny, P., and Huang J. (2009) Penalized methods for bi-level variable selection. Statistics and its interface, 2: 369-380. doi:10.4310/sii.2009.v2.n3.a10

Examples

# Generate data
 n <- 100
 p <- 200
 nr <- 10
 g <- ceiling(1:p / nr)
 X <- matrix(rnorm(n * p), n, p)
 b <- c(-3:3)
 y_lin <- X[, 1:length(b)] %*% b + 5 * rnorm(n)
 y_log <- rbinom(n, 1, exp(y_lin) / (1 + exp(y_lin)))

# Linear regression
 lin_fit <- sgp(X, y_lin, g, type = "linear", penalty = "sgl")
 plot(lin_fit)
 lin_fit <- sgp(X, y_lin, g, type = "linear", penalty = "sgs")
 plot(lin_fit)
 lin_fit <- sgp(X, y_lin, g, type = "linear", penalty = "sgm")
 plot(lin_fit)
 lin_fit <- sgp(X, y_lin, g, type = "linear", penalty = "sge")
 plot(lin_fit)

# Logistic regression
 log_fit <- sgp(X, y_log, g, type = "logit", penalty = "sgl")
 plot(log_fit)
 log_fit <- sgp(X, y_log, g, type = "logit", penalty = "sgs")
 plot(log_fit)
 log_fit <- sgp(X, y_log, g, type = "logit", penalty = "sgm")
 plot(log_fit)
 log_fit <- sgp(X, y_log, g, type = "logit", penalty = "sge")
 plot(log_fit)

[Package SGPR version 0.1.2 Index]