sgp {SGPR} | R Documentation |
Fit a sparse group regularized regression path
Description
A function that determines the regularization paths for models with sparse group penalties at a grid of values for the regularization parameter lambda.
Usage
sgp(
X,
y,
group = 1:ncol(X),
penalty = c("sgl", "sgs", "sgm", "sge"),
alpha = 1/3,
type = c("linear", "logit"),
Z = NULL,
nlambda = 100,
lambda.min = {
if (nrow(X) > ncol(X))
1e-04
else 0.05
},
log.lambda = TRUE,
lambdas,
prec = 1e-04,
ada_mult = 2,
max.iter = 10000,
standardize = TRUE,
vargamma = ifelse(pvar == "scad" | penalty == "sgs", 4, 3),
grgamma = ifelse(pgr == "scad" | penalty == "sgs", 4, 3),
vartau = 1,
grtau = 1,
pvar = c("lasso", "scad", "mcp", "exp"),
pgr = c("lasso", "scad", "mcp", "exp"),
group.weight = rep(1, length(unique(group))),
returnX = FALSE,
...
)
Arguments
X |
The design matrix without intercept with the variables to be selected. |
y |
The response vector. |
group |
A vector indicating the group membership of each variable in X. |
penalty |
A string that specifies the sparse group penalty to be used. |
alpha |
Tuning parameter for the mixture of penalties at group and variable level. A value of 0 results in a selection at group level, a value of 1 results in a selection at variable level and everything in between is bi-level selection. |
type |
A string indicating the type of regression model (linear or binomial). |
Z |
The design matrix of the variables to be included in the model without penalization. |
nlambda |
An integer that specifies the length of the lambda sequence. |
lambda.min |
An integer multiplied by the maximum lambda to define the end of the lambda sequence. |
log.lambda |
A Boolean value that specifies whether the values of the lambda sequence should be on the log scale. |
lambdas |
A user supplied vector with values for lambda. |
prec |
The convergence threshold for the algorithm. |
ada_mult |
An integer that defines the multiplier for adjusting the convergence threshold. |
max.iter |
The convergence threshold for the algorithm. |
standardize |
An integer that defines the multiplier for adjusting the convergence threshold. |
vargamma |
An integer that defines the value of gamma for the penalty at the variable level. |
grgamma |
An integer that specifies the value of gamma for the penalty at the group level. |
vartau |
An integer that defines the value of tau for the penalty at the variable level. |
grtau |
An integer that specifies the value of tau for the penalty at the group level. |
pvar |
A string that specifies the penalty used at the variable level. |
pgr |
A string that specifies the penalty used at the group level. |
group.weight |
A vector specifying weights that are multiplied by the group penalty to account for different group sizes. |
returnX |
A Boolean value that specifies whether standardized design matrix should be returned. |
... |
Other parameters of underlying basic functions. |
Details
Two options are available for choosing a penalty. With the argument penalty
,
the methods Sparse Group LASSO, Sparse Group SCAD, Sparse Group MCP and Sparse Group EP
can be selected with the abbreviations sgl
, sgs
, sgm
and sge
.
Alternatively, penalties can be combined additively with the arguments pvar
and pgr
, where pvar
is the penalty applied at the variable level and
pgr
is the penalty applied at the group level. The options are lasso
,
scad
, mcp
and exp
for Least Absolute Shrinkage and Selection Operator,
Smoothly Clipped Absolute Deviation, Minimax Concave Penalty and Exponential Penalty.
Value
A list containing:
- beta
A vector with estimated coefficients.
- type
A string indicating the type of regression model (linear or binomial).
- group
A vector indicating the group membership of the individual variables in X.
- lambdas
The sequence of lambda values.
- alpha
Tuning parameter for the mixture of penalties at group and variable level.
- loss
A vector containing either the residual sum of squares (linear) or the negative log-likelihood (binomial).
- prec
The convergence threshold used for each lambda.
- n
Number of observations.
- penalty
A string indicating the sparse group penalty used.
- df
A vector of pseudo degrees of freedom for each lambda.
- iter
A vector of the number of iterations for each lambda.
- group.weight
A vector of weights multiplied by the group penalty.
- y
The response vector.
- X
The design matrix without intercept.
References
Buch, G., Schulz, A., Schmidtmann, I., Strauch, K., and Wild, P. S. (2024) Sparse Group Penalties for bi-level variable selection. Biometrical Journal, 66, 2200334. doi:10.1002/bimj.202200334
Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2011) A Sparse-Group Lasso. Journal of computational and graphical statistics, 22(2), 231-245. doi:10.1080/10618600.2012.681250
Breheny, P., and Huang J. (2009) Penalized methods for bi-level variable selection. Statistics and its interface, 2: 369-380. doi:10.4310/sii.2009.v2.n3.a10
Examples
# Generate data
n <- 100
p <- 200
nr <- 10
g <- ceiling(1:p / nr)
X <- matrix(rnorm(n * p), n, p)
b <- c(-3:3)
y_lin <- X[, 1:length(b)] %*% b + 5 * rnorm(n)
y_log <- rbinom(n, 1, exp(y_lin) / (1 + exp(y_lin)))
# Linear regression
lin_fit <- sgp(X, y_lin, g, type = "linear", penalty = "sgl")
plot(lin_fit)
lin_fit <- sgp(X, y_lin, g, type = "linear", penalty = "sgs")
plot(lin_fit)
lin_fit <- sgp(X, y_lin, g, type = "linear", penalty = "sgm")
plot(lin_fit)
lin_fit <- sgp(X, y_lin, g, type = "linear", penalty = "sge")
plot(lin_fit)
# Logistic regression
log_fit <- sgp(X, y_log, g, type = "logit", penalty = "sgl")
plot(log_fit)
log_fit <- sgp(X, y_log, g, type = "logit", penalty = "sgs")
plot(log_fit)
log_fit <- sgp(X, y_log, g, type = "logit", penalty = "sgm")
plot(log_fit)
log_fit <- sgp(X, y_log, g, type = "logit", penalty = "sge")
plot(log_fit)