R: Define Individual Subpenalties for a Multi-Type Regularized...

p {smurf}

R Documentation

Define Individual Subpenalties for a Multi-Type Regularized GLM

Description

Function used to define regularization terms in a glmsmurf model formula.

Usage

p(pred1, pred2 = NULL, pen = "lasso", refcat = NULL, group = NULL)

Arguments

`pred1`	Name of the predictor used in the regularization term.
`pred2`	Either `NULL` (default) meaning that only one predictor is used in the regularization term, or the name of the second predictor that is used in a 2D Fused Lasso regularization term.
`pen`	Type of penalty for this predictor, one of `"none"` (no penalty), `"lasso"` (Lasso), `"grouplasso"` (Group Lasso), `"flasso"` (Fused Lasso), `"gflasso"` (Generalized Fused Lasso), `"2dflasso"` (2D Fused Lasso), `"ggflasso"` (Graph-Guided Fused Lasso). Default is `"lasso"`.
`refcat`	Reference level when `pred1` is a factor and `pen` is `"none"`, `"flasso"`, `"gflasso"`, or `"ggflasso"`; otherwise `refcat` is ignored. Default is `NULL` which means that the first level of `pred1` is used as the reference level (if `refcat` is not ignored).
`group`	Group to which the predictor belongs, only used for a Group Lasso penalty. Default is `NULL` which means that predictor does not belong to a group.

Details

Predictors with no penalty, a Lasso penalty or a Group Lasso penalty should be numeric or a factor which can be non-numeric. Predictors with a Fused Lasso, Generalized Fused Lasso, Graph-Guided Fused Lasso or 2D Fused Lasso penalty should be given as a factor which can also be non-numeric. When a predictor is given as a factor, there cannot be any unused levels.

For a predictor with a Fused Lasso penalty, the levels should be ordered from smallest to largest. The first level will be the reference level, but this can be changed using the refcat argument.

When lambda * lambda1 > 0 or lambda * lambda2 > 0 in glmsmurf, no reference level is used for the Fused Lasso, Generalized Fused Lasso and Graph-Guided Fused Lasso penalties, and refcat will hence be ignored.

If pred2 is different from NULL, pen should be set to "2dflasso", and vice versa. Note that there cannot be any unused levels in the interaction between pred1 and pred2.

When adding an interaction between pred1 and pred2 with a 2D Fused Lasso penalty, the 1D effects should also be present in the model and the reference categories for the 1D predictors need to be the respective first levels. The reference level for the 2D predictor will then be the 2D level where it least one of the 1D components is equal to the 1D reference levels. It is also allowed to add binned factors, of predictors that are included in the model, in the interaction. They should have the original predictor name + '.binned' as predictor names. For example: the original predictors 'age' and 'power' are included in the model and the interaction of 'age.binned' and 'power.binned' can also be present in the model formula.

An overview of the different penalty types and their usage can be found in the package vignette.

Examples

# Munich rent data from catdata package
data("rent", package = "catdata")

# The considered predictors are the same as in 
# Gertheiss and Tutz (Ann. Appl. Stat., 2010).
# Response is monthly rent per square meter in Euro

# Urban district in Munich
rent$area <- as.factor(rent$area)

# Decade of construction
rent$year <- as.factor(floor(rent$year / 10) * 10)

# Number of rooms
rent$rooms <- as.factor(rent$rooms)

# Quality of the house with levels "fair", "good" and "excellent"
rent$quality <- as.factor(rent$good + 2 * rent$best)
levels(rent$quality) <- c("fair", "good", "excellent")

# Floor space divided in categories (0, 30), [30, 40), ...,  [130, 140)
sizeClasses <- c(0, seq(30, 140, 10))
rent$size <- as.factor(sizeClasses[findInterval(rent$size, sizeClasses)])

# Is warm water present?
rent$warm <- factor(rent$warm, labels = c("yes", "no"))

# Is central heating present?
rent$central <- factor(rent$central, labels = c("yes", "no"))

# Does the bathroom have tiles?
rent$tiles <- factor(rent$tiles, labels = c("yes", "no"))

# Is there special furniture in the bathroom?
rent$bathextra <- factor(rent$bathextra, labels = c("no", "yes"))

# Is the kitchen well-equipped?
rent$kitchen <- factor(rent$kitchen, labels = c("no", "yes"))



# Create formula with 'rentm' as response variable,
# 'area' with a Generalized Fused Lasso penalty,
# 'year', 'rooms', 'quality' and 'size' with Fused Lasso penalties 
# where the reference category for 'year' is changed to 2000,
# 'warm' and 'central' are in one group for the Group Lasso penalty,
# 'tiles' and 'bathextra' are not regularized and 
# 'kitchen' has a Lasso penalty
formu <- rentm ~ p(area, pen = "gflasso") + 
  p(year, pen = "flasso", refcat = 2000) + p(rooms, pen = "flasso") + 
  p(quality, pen = "flasso") + p(size, pen = "flasso") +
  p(warm, pen = "grouplasso", group = 1) + p(central, pen = "grouplasso", group = 1) + 
  p(tiles, pen = "none") + bathextra + 
  p(kitchen, pen = "lasso")


# Fit a multi-type regularized GLM using the SMuRF algorithm.
# We use standardization adaptive penalty weights based on an initial GLM fit.
munich.fit <- glmsmurf(formula = formu, family = gaussian(), data = rent, 
                       pen.weights = "glm.stand", lambda = 0.1)

# Model summary
summary(munich.fit)