p {smurf} | R Documentation |
Define Individual Subpenalties for a Multi-Type Regularized GLM
Description
Function used to define regularization terms in a glmsmurf
model formula.
Usage
p(pred1, pred2 = NULL, pen = "lasso", refcat = NULL, group = NULL)
Arguments
pred1 |
Name of the predictor used in the regularization term. |
pred2 |
Either |
pen |
Type of penalty for this predictor, one of
Default is |
refcat |
Reference level when |
group |
Group to which the predictor belongs, only used for a Group Lasso penalty.
Default is |
Details
Predictors with no penalty, a Lasso penalty or a Group Lasso penalty should be numeric or a factor which can be non-numeric. Predictors with a Fused Lasso, Generalized Fused Lasso, Graph-Guided Fused Lasso or 2D Fused Lasso penalty should be given as a factor which can also be non-numeric. When a predictor is given as a factor, there cannot be any unused levels.
For a predictor with a Fused Lasso penalty, the levels should be ordered from smallest to largest.
The first level will be the reference level, but this can be changed using the refcat
argument.
When lambda * lambda1 > 0
or lambda * lambda2 > 0
in glmsmurf
, no reference level is used
for the Fused Lasso, Generalized Fused Lasso and Graph-Guided Fused Lasso penalties, and refcat
will hence be ignored.
If pred2
is different from NULL
, pen
should be set to "2dflasso"
, and vice versa.
Note that there cannot be any unused levels in the interaction between pred1
and pred2
.
When adding an interaction between pred1
and pred2
with a 2D Fused Lasso penalty, the 1D effects
should also be present in the model and the reference categories for the 1D predictors need to be the respective first levels.
The reference level for the 2D predictor will then be the 2D level where it least one of the 1D components is equal to the 1D reference levels.
It is also allowed to add binned factors, of predictors
that are included in the model, in the interaction. They should have the original predictor name + '.binned' as predictor names.
For example: the original predictors 'age' and 'power' are included in the model and
the interaction of 'age.binned' and 'power.binned' can also be present in the model formula.
An overview of the different penalty types and their usage can be found in the package vignette.
See Also
Examples
# Munich rent data from catdata package
data("rent", package = "catdata")
# The considered predictors are the same as in
# Gertheiss and Tutz (Ann. Appl. Stat., 2010).
# Response is monthly rent per square meter in Euro
# Urban district in Munich
rent$area <- as.factor(rent$area)
# Decade of construction
rent$year <- as.factor(floor(rent$year / 10) * 10)
# Number of rooms
rent$rooms <- as.factor(rent$rooms)
# Quality of the house with levels "fair", "good" and "excellent"
rent$quality <- as.factor(rent$good + 2 * rent$best)
levels(rent$quality) <- c("fair", "good", "excellent")
# Floor space divided in categories (0, 30), [30, 40), ..., [130, 140)
sizeClasses <- c(0, seq(30, 140, 10))
rent$size <- as.factor(sizeClasses[findInterval(rent$size, sizeClasses)])
# Is warm water present?
rent$warm <- factor(rent$warm, labels = c("yes", "no"))
# Is central heating present?
rent$central <- factor(rent$central, labels = c("yes", "no"))
# Does the bathroom have tiles?
rent$tiles <- factor(rent$tiles, labels = c("yes", "no"))
# Is there special furniture in the bathroom?
rent$bathextra <- factor(rent$bathextra, labels = c("no", "yes"))
# Is the kitchen well-equipped?
rent$kitchen <- factor(rent$kitchen, labels = c("no", "yes"))
# Create formula with 'rentm' as response variable,
# 'area' with a Generalized Fused Lasso penalty,
# 'year', 'rooms', 'quality' and 'size' with Fused Lasso penalties
# where the reference category for 'year' is changed to 2000,
# 'warm' and 'central' are in one group for the Group Lasso penalty,
# 'tiles' and 'bathextra' are not regularized and
# 'kitchen' has a Lasso penalty
formu <- rentm ~ p(area, pen = "gflasso") +
p(year, pen = "flasso", refcat = 2000) + p(rooms, pen = "flasso") +
p(quality, pen = "flasso") + p(size, pen = "flasso") +
p(warm, pen = "grouplasso", group = 1) + p(central, pen = "grouplasso", group = 1) +
p(tiles, pen = "none") + bathextra +
p(kitchen, pen = "lasso")
# Fit a multi-type regularized GLM using the SMuRF algorithm.
# We use standardization adaptive penalty weights based on an initial GLM fit.
munich.fit <- glmsmurf(formula = formu, family = gaussian(), data = rent,
pen.weights = "glm.stand", lambda = 0.1)
# Model summary
summary(munich.fit)