glmsmurf {smurf} | R Documentation |
Fit a Multi-Type Regularized GLM Using the SMuRF Algorithm
Description
SMuRF algorithm to fit a generalized linear model (GLM) with multiple types of predictors via regularized maximum likelihood.
glmsmurf.fit
contains the fitting function for a given design matrix.
Usage
glmsmurf(
formula,
family,
data,
weights,
start,
offset,
lambda,
lambda1 = 0,
lambda2 = 0,
pen.weights,
adj.matrix,
standardize = TRUE,
control = list(),
x.return = FALSE,
y.return = TRUE,
pen.weights.return = FALSE
)
glmsmurf.fit(
X,
y,
weights,
start,
offset,
family,
pen.cov,
n.par.cov,
group.cov,
refcat.cov,
lambda,
lambda1 = 0,
lambda2 = 0,
pen.weights,
adj.matrix,
standardize = TRUE,
control = list(),
formula = NULL,
data = NULL,
x.return = FALSE,
y.return = FALSE,
pen.weights.return = FALSE
)
Arguments
formula |
A |
family |
A |
data |
A data frame containing the model response and predictors for |
weights |
An optional vector of prior weights to use in the likelihood. It should be a numeric vector of length |
start |
A vector containing the starting values for the coefficients. It should either be a numeric vector
of length |
offset |
A vector containing the offset for the model. It should be a vector of size |
lambda |
Either the penalty parameter, a positive number; or a string describing the method and measure used to select the penalty parameter:
E.g. |
lambda1 |
The penalty parameter for the |
lambda2 |
The penalty parameter for the |
pen.weights |
Either a string describing the method to compute the penalty weights:
or a list with the penalty weight vector per predictor. This list should have length equal to the number of predictors and predictor names as element names. |
adj.matrix |
A named list containing the adjacency matrices (a.k.a. neighbor matrices) for each of the predictors with a Graph-Guided Fused Lasso penalty. The list elements should have the names of the corresponding predictors. If only one predictor has a Graph-Guided Fused Lasso penalty, it is also possible to only give the adjacency matrix itself (not in a list). |
standardize |
Logical indicating if predictors with a Lasso or Group Lasso penalty are standardized, default is |
control |
A list of parameters used in the fitting process. This is passed to |
x.return |
Logical indicating if the used model matrix should be returned in the output object, default is |
y.return |
Logical indicating if the used response vector should be returned in the output object, default is |
pen.weights.return |
Logical indicating if the list of the used penalty weight vector per predictor should be returned in the output object, default is |
X |
Only for |
y |
Only for |
pen.cov |
Only for |
n.par.cov |
Only for |
group.cov |
Only for |
refcat.cov |
Only for |
Details
See the package vignette for more details and a complete description of a use case.
As a user, it is important to take the following into acocunt:
The estimated coefficients are rounded to 7 digits.
The cross-validation folds are not deterministic. The validation sample for selecting lambda out-of-sample is determined at random when no indices are provided in 'validation.index' in the control object argument. In these cases, the selected value of lambda is hence not deterministic. When selecting lambda in-sample, or out-of-sample when indices are provided in 'validation.index' in the control object argument, the selected value of lambda is deterministic.
The
glmsmurf
function can handle many use cases and is preferred for general use. Theglmsmurf.fit
function requires a more thorough understanding of the package internals and should hence be used with care!
Value
An object of class 'glmsmurf
' is returned. See glmsmurf-class
for more details about this class and its generic functions.
References
Devriendt, S., Antonio, K., Reynkens, T. and Verbelen, R. (2021). "Sparse Regression with Multi-type Regularized Feature Modeling", Insurance: Mathematics and Economics, 96, 248–261. <doi:10.1016/j.insmatheco.2020.11.010>.
Hastie, T., Tibshirani, R., and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press.
See Also
glmsmurf-class
, glmsmurf.control
, p
, glm
Examples
# Munich rent data from catdata package
data("rent", package = "catdata")
# The considered predictors are the same as in
# Gertheiss and Tutz (Ann. Appl. Stat., 2010).
# Response is monthly rent per square meter in Euro
# Urban district in Munich
rent$area <- as.factor(rent$area)
# Decade of construction
rent$year <- as.factor(floor(rent$year / 10) * 10)
# Number of rooms
rent$rooms <- as.factor(rent$rooms)
# Quality of the house with levels "fair", "good" and "excellent"
rent$quality <- as.factor(rent$good + 2 * rent$best)
levels(rent$quality) <- c("fair", "good", "excellent")
# Floor space divided in categories (0, 30), [30, 40), ..., [130, 140)
sizeClasses <- c(0, seq(30, 140, 10))
rent$size <- as.factor(sizeClasses[findInterval(rent$size, sizeClasses)])
# Is warm water present?
rent$warm <- factor(rent$warm, labels = c("yes", "no"))
# Is central heating present?
rent$central <- factor(rent$central, labels = c("yes", "no"))
# Does the bathroom have tiles?
rent$tiles <- factor(rent$tiles, labels = c("yes", "no"))
# Is there special furniture in the bathroom?
rent$bathextra <- factor(rent$bathextra, labels = c("no", "yes"))
# Is the kitchen well-equipped?
rent$kitchen <- factor(rent$kitchen, labels = c("no", "yes"))
# Create formula with 'rentm' as response variable,
# 'area' with a Generalized Fused Lasso penalty,
# 'year', 'rooms', 'quality' and 'size' with Fused Lasso penalties,
# and the other predictors with Lasso penalties.
formu <- rentm ~ p(area, pen = "gflasso") +
p(year, pen = "flasso") + p(rooms, pen = "flasso") +
p(quality, pen = "flasso") + p(size, pen = "flasso") +
p(warm, pen = "lasso") + p(central, pen = "lasso") +
p(tiles, pen = "lasso") + p(bathextra, pen = "lasso") +
p(kitchen, pen = "lasso")
# Fit a multi-type regularized GLM using the SMuRF algorithm.
# We use standardization adaptive penalty weights based on an initial GLM fit.
# The value for lambda is selected using cross-validation
# (with the deviance as loss measure and the one standard error rule), see example(plot_lambda)
munich.fit <- glmsmurf(formula = formu, family = gaussian(), data = rent,
pen.weights = "glm.stand", lambda = 0.02)
####
# S3 methods for glmsmurf objects
# Model summary
summary(munich.fit)
# Get coefficients of estimated model
coef(munich.fit)
# Get coefficients of re-estimated model
coef_reest(munich.fit)
# Plot coefficients of estimated model
plot(munich.fit)
# Plot coefficients of re-estimated model
plot_reest(munich.fit)
# Get deviance of estimated model
deviance(munich.fit)
# Get deviance of re-estimated model
deviance_reest(munich.fit)
# Get fitted values of estimated model
fitted(munich.fit)
# Get fitted values of re-estimated model
fitted_reest(munich.fit)
# Get predicted values of estimated model on scale of linear predictors
predict(munich.fit, type = "link")
# Get predicted values of re-estimated model on scale of linear predictors
predict_reest(munich.fit, type = "link")
# Get deviance residuals of estimated model
residuals(munich.fit, type = "deviance")
# Get deviance residuals of re-estimated model
residuals_reest(munich.fit, type = "deviance")