grpnet {grpnet} | R Documentation |
Fit a Group Elastic Net Regularized GLM/GAM
Description
Fits generalized linear/additive models with a group elastic net penalty using an adaptively bounded gradient descent (ABGD) algorithm (Helwig, 2024). Predictor groups can be manually input (default S3 method) or inferred from the model (S3 "formula" method). The regularization path is computed at a data-generated (default) or user-provided sequence of lambda values.
Usage
grpnet(x, ...)
## Default S3 method:
grpnet(x,
y,
group,
family = c("gaussian", "binomial", "multinomial", "poisson",
"negative.binomial", "Gamma", "inverse.gaussian"),
weights = NULL,
offset = NULL,
alpha = 1,
nlambda = 100,
lambda.min.ratio = ifelse(nobs < nvars, 0.05, 0.0001),
lambda = NULL,
penalty.factor = NULL,
penalty = c("LASSO", "MCP", "SCAD"),
gamma = 4,
theta = 1,
standardized = !orthogonalized,
orthogonalized = TRUE,
intercept = TRUE,
thresh = 1e-04,
maxit = 1e05,
proglang = c("Fortran", "R"),
...)
## S3 method for class 'formula'
grpnet(formula,
data,
use.rk = TRUE,
family = c("gaussian", "binomial", "multinomial", "poisson",
"negative.binomial", "Gamma", "inverse.gaussian"),
weights = NULL,
offset = NULL,
alpha = 1,
nlambda = 100,
lambda.min.ratio = ifelse(nobs < nvars, 0.05, 0.0001),
lambda = NULL,
penalty.factor = NULL,
penalty = c("LASSO", "MCP", "SCAD"),
gamma = 4,
theta = 1,
standardized = !orthogonalized,
orthogonalized = TRUE,
thresh = 1e-04,
maxit = 1e05,
proglang = c("Fortran", "R"),
...)
Arguments
x |
Model (design) matrix of dimension |
y |
Response vector of length |
group |
Group label vector (factor, character, or integer) of length |
formula |
Model formula: a symbolic description of the model to be fitted. Uses the same syntax as |
data |
Optional data frame containing the variables referenced in |
use.rk |
If |
family |
Character specifying the assumed distribution for the response variable. Partial matching is allowed. Options include |
weights |
Optional vector of length |
offset |
Optional vector of length |
alpha |
Regularization hyperparameter satisfying |
nlambda |
Number of |
lambda.min.ratio |
The proportion |
lambda |
Optional vector of user-supplied regularization parameter values. |
penalty.factor |
Default S3 method: vector of length S3 "formula" method: named list giving the non-negative penalty weight for terms specified in the formula. Incomplete lists are allowed. Any term that is specified in |
penalty |
Character specifying which (group) penalty to use: LASSO , MCP, or SCAD. |
gamma |
Penalty hyperparameter that satisfies |
theta |
Additional ("size") parameter for negative binomial responses, where the variance function is defined as |
standardized |
Logical indicating whether the predictors should be groupwise standardized. If |
orthogonalized |
Logical indicating whether the predictors should be groupwise orthogonalized. If |
intercept |
Logical indicating whether an intercept term should be included in the model. Note that the intercept is always unpenalized. |
thresh |
Convergence threshold (tolerance). The algorithm is determined to have converged once the maximum relative change in the coefficients is below this threshold. See "Convergence" section. |
maxit |
Maximum number of iterations to allow. |
proglang |
Which programming language should be used to implement the ABGD algorithm? Options include |
... |
Additional arguments used by the default or formula method. |
Details
Consider a generalized linear model of the form
g(\mu) = \mathbf{X}^\top \boldsymbol\beta
where \mu = E(Y | \mathbf{X})
is the conditional expectation of the response Y
given the predictor vector \mathbf{X}
, the function g(\cdot)
is a user-specified (invertible) link function, and \boldsymbol\beta
are the unknown regression coefficients. Furthermore, suppose that the predictors are grouped, such as
\mathbf{X}^\top \boldsymbol\beta = \sum_{k=1}^K \mathbf{X}_k^\top \boldsymbol\beta_k
where \mathbf{X} = (\mathbf{X}_1, \ldots, \mathbf{X}_K)
is the grouped predictor vector, and \boldsymbol\beta = (\boldsymbol\beta_1, \ldots, \boldsymbol\beta_K)
is the grouped coefficient vector.
Given n
observations, this function finds the \boldsymbol\beta
that minimizes
L(\boldsymbol\beta | \mathbf{D}) + \lambda P_\alpha(\boldsymbol\beta)
where L(\boldsymbol\beta | \mathbf{D})
is the loss function with \mathbf{D} = \{\mathbf{y}, \mathbf{X}\}
denoting the observed data, P_\alpha(\boldsymbol\beta)
is the group elastic net penalty, and \lambda \geq 0
is the regularization parameter.
The loss function has the form
L(\boldsymbol\beta | \mathbf{D}) = \frac{1}{n} \sum_{i=1}^n w_i \ell_i(\boldsymbol\beta | \mathbf{D}_i)
where w_i > 0
are the user-supplied weights
, and \ell_i(\boldsymbol\beta | \mathbf{D}_i)
is the i
-th observation's contribution to the loss function. Note that \ell(\cdot) = -\log(f_Y(\cdot))
denotes the negative log-likelihood function for the given family
.
The group elastic net penalty function has the form
P_\alpha(\boldsymbol\beta) = \alpha P_1(\boldsymbol\beta) + (1 - \alpha) P_2(\boldsymbol\beta)
where \alpha \in [0,1]
is the user-specified alpha
value,
P_1(\boldsymbol\beta) = \sum_{k=1}^K \omega_k \| \boldsymbol\beta_k \|
is the group lasso penalty with \omega_k \geq 0
denoting the k
-th group's penalty.factor
, and
P_2(\boldsymbol\beta) = \frac{1}{2} \sum_{k=1}^K \omega_k \| \boldsymbol\beta_k \|^2
is the group ridge penalty. Note that \| \boldsymbol\beta_k \|^2 = \boldsymbol\beta_k^\top \boldsymbol\beta_k
denotes the squared Euclidean norm. When penalty %in% c("MCP", "SCAD")
, the group L1 penalty P_1(\boldsymbol\beta)
is replaced by the group MCP or group SCAD penalty.
Value
An object of class "grpnet"
with the following elements:
call |
matched call |
a0 |
intercept sequence of length |
beta |
coefficient matrix of dimension |
alpha |
balance between the group L1 (lasso) and group L2 (ridge) penalty |
lambda |
sequence of regularization parameter values |
family |
exponential family defining the loss function |
dev.ratio |
proportion of (null) deviance explained for each |
nulldev |
null deviance for each |
df |
effective degrees of freedom for each |
nzgrp |
number of non-zero groups for each |
nzcoef |
number of non-zero coefficients for each |
xsd |
standard deviation of x for each group |
ylev |
levels of response variable (only for binomial and multinomial families) |
nobs |
number of observations |
group |
group label vector |
ngroups |
number of groups |
npasses |
number of iterations for each |
time |
runtime in seconds to compute regularization path |
offset |
logical indicating if an offset was included |
args |
list of input argument values |
formula |
input formula (possibly after expansion) |
term.labels |
terms that appear in formula (if applicable) |
rk.args |
arguments for rk.model.matrix function (if applicable) |
S3 "formula" method
Important: When using the S3 "formula" method, the S3 "predict" method forms the model matrix for the predictions by applying the model formula to the new data. As a result, to ensure that the corresponding S3 "predict" method works correctly, some formulaic features should be avoided.
Polynomials: When including polynomial terms, the poly
function should be used with option raw = TRUE
. Default use of the poly
function (with raw = FALSE
) will work for fitting the model, but will result in invalid predictions for new data. Polynomials can also be included via the I
function, but this isn't recommended because the polynomials terms wouldn't be grouped together, i.e., the terms x
and I(x^2)
would be treated as two separate groups of size one instead of a single group of size two.
Splines: B-splines (and other spline bases) can be included via the S3 "formula" method. However, to ensure reasonable predictions for new data, it is necessary to specify the knots directly. For example, if x
is a vector with entries between zero and one, the code bs(x, df = 5)
will *not* produce valid predictions for new data, but the code bs(x, knots = c(0.25, 0.5, 0.75), Boundary.knots = c(0, 1))
will work as intended. Instead of attempting to integrate a call to bs()
or rk()
into the model formula, it is recommended that splines be included via the use.rk = TRUE
argument.
Family argument and link functions
Unlike the glm
function, the family
argument of the grpnet
function
* should be a character vector (not a family
object)
* does not allow for specification of a link function
Currently, there is only one available link function for each family
:
* gaussian (identity): \mu = \mathbf{X}^\top \boldsymbol\beta
* binomial (logit): \log(\frac{\pi}{1 - \pi}) = \mathbf{X}^\top \boldsymbol\beta
* multinomial (symmetric): \pi_\ell = \frac{\exp(\mathbf{X}^\top \boldsymbol\beta_\ell)}{\sum_{l = 1}^m \exp(\mathbf{X}^\top \boldsymbol\beta_l)}
* poisson (log): \log(\mu) = \mathbf{X}^\top \boldsymbol\beta
* negative.binomial (log): \log(\mu) = \mathbf{X}^\top \boldsymbol\beta
* Gamma (log): \log(\mu) = \mathbf{X}^\top \boldsymbol\beta
* inverse.gaussian (log): \log(\mu) = \mathbf{X}^\top \boldsymbol\beta
Binomial and multinomial
For "binomial"
responses, three different possibilities exist for the input response:
1. vector coercible into a factor with two levels
2. matrix with two columns (# successes, # failures)
3. numeric vector with entries between 0 and 1
In this case, the weights
argument should be used specify the total number of trials.
For "multinomial"
responses, two different possibilities exist for the input reponse:
1. vector coercible into a factor with more than two levels
2. matrix of integers (counts) for each category level
Convergence
The algorithm is determined to have converged once
\max_j \frac{| \beta_j - \beta_j^{\mathrm{old}} |}{1 + |\beta_j^{\mathrm{old}}|} < \epsilon
where j \in \{1,\ldots,p\}
and \epsilon
is the thresh
argument.
Note
The syntax of (the default S3 method for) this function closely mimics that of the glmnet
function in the glmnet package (Friedman, Hastie, & Tibshirani, 2010).
Author(s)
Nathaniel E. Helwig <helwig@umn.edu>
References
Breheny, P., & Huang, J. (2015). Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 25, 173-187. doi:10.1007/s11222-013-9424-2
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1-22. doi:10.18637/jss.v033.i01
Helwig, N. E. (2024). Versatile descent algorithms for group regularization and variable selection in generalized linear models. Journal of Computational and Graphical Statistics. doi:10.1080/10618600.2024.2362232
Yang, Y., & Zou, H. (2015). A fast unified algorithm for solving group-lasso penalize learning problems. Statistics and Computing, 25, 1129-1141. doi:10.1007/s11222-014-9498-5
See Also
plot.grpnet
for plotting the regularization path
predict.grpnet
for predicting from grpnet
objects
cv.grpnet
for k-fold cross-validation of lambda
Examples
######***###### family = "gaussian" ######***######
# load data
data(auto)
# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto)
# print regularization path info
mod
# plot coefficient paths
plot(mod)
######***###### family = "binomial" ######***######
# load data
data(auto)
# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")
# fit model (formula method, response = origin with 2 levels)
mod <- grpnet(origin ~ ., data = auto, family = "binomial")
# print regularization path info
mod
# plot coefficient paths
plot(mod)
######***###### family = "multinomial" ######***######
# load data
data(auto)
# fit model (formula method, response = origin with 3 levels)
mod <- grpnet(origin ~ ., data = auto, family = "multinomial")
# print regularization path info
mod
# plot coefficient paths
plot(mod)
######***###### family = "poisson" ######***######
# load data
data(auto)
# fit model (formula method, response = horsepower)
mod <- grpnet(horsepower ~ ., data = auto, family = "poisson")
# print regularization path info
mod
# plot coefficient paths
plot(mod)
######***###### family = "negative.binomial" ######***######
# load data
data(auto)
# fit model (formula method, response = horsepower)
mod <- grpnet(horsepower ~ ., data = auto, family = "negative.binomial")
# print regularization path info
mod
# plot coefficient paths
plot(mod)
######***###### family = "Gamma" ######***######
# load data
data(auto)
# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto, family = "Gamma")
# print regularization path info
mod
# plot coefficient paths
plot(mod)
######***###### family = "inverse.gaussian" ######***######
# load data
data(auto)
# fit model (formula method, response = mpg)
mod <- grpnet(mpg ~ ., data = auto, family = "inverse.gaussian")
# print regularization path info
mod
# plot coefficient paths
plot(mod)