p {smurf}R Documentation

Define Individual Subpenalties for a Multi-Type Regularized GLM

Description

Function used to define regularization terms in a glmsmurf model formula.

Usage

p(pred1, pred2 = NULL, pen = "lasso", refcat = NULL, group = NULL)

Arguments

pred1

Name of the predictor used in the regularization term.

pred2

Either NULL (default) meaning that only one predictor is used in the regularization term, or the name of the second predictor that is used in a 2D Fused Lasso regularization term.

pen

Type of penalty for this predictor, one of

  • "none" (no penalty),

  • "lasso" (Lasso),

  • "grouplasso" (Group Lasso),

  • "flasso" (Fused Lasso),

  • "gflasso" (Generalized Fused Lasso),

  • "2dflasso" (2D Fused Lasso),

  • "ggflasso" (Graph-Guided Fused Lasso).

Default is "lasso".

refcat

Reference level when pred1 is a factor and pen is "none", "flasso", "gflasso", or "ggflasso"; otherwise refcat is ignored. Default is NULL which means that the first level of pred1 is used as the reference level (if refcat is not ignored).

group

Group to which the predictor belongs, only used for a Group Lasso penalty. Default is NULL which means that predictor does not belong to a group.

Details

Predictors with no penalty, a Lasso penalty or a Group Lasso penalty should be numeric or a factor which can be non-numeric. Predictors with a Fused Lasso, Generalized Fused Lasso, Graph-Guided Fused Lasso or 2D Fused Lasso penalty should be given as a factor which can also be non-numeric. When a predictor is given as a factor, there cannot be any unused levels.

For a predictor with a Fused Lasso penalty, the levels should be ordered from smallest to largest. The first level will be the reference level, but this can be changed using the refcat argument.

When lambda * lambda1 > 0 or lambda * lambda2 > 0 in glmsmurf, no reference level is used for the Fused Lasso, Generalized Fused Lasso and Graph-Guided Fused Lasso penalties, and refcat will hence be ignored.

If pred2 is different from NULL, pen should be set to "2dflasso", and vice versa. Note that there cannot be any unused levels in the interaction between pred1 and pred2.

When adding an interaction between pred1 and pred2 with a 2D Fused Lasso penalty, the 1D effects should also be present in the model and the reference categories for the 1D predictors need to be the respective first levels. The reference level for the 2D predictor will then be the 2D level where it least one of the 1D components is equal to the 1D reference levels. It is also allowed to add binned factors, of predictors that are included in the model, in the interaction. They should have the original predictor name + '.binned' as predictor names. For example: the original predictors 'age' and 'power' are included in the model and the interaction of 'age.binned' and 'power.binned' can also be present in the model formula.

An overview of the different penalty types and their usage can be found in the package vignette.

See Also

glmsmurf

Examples

# Munich rent data from catdata package
data("rent", package = "catdata")

# The considered predictors are the same as in 
# Gertheiss and Tutz (Ann. Appl. Stat., 2010).
# Response is monthly rent per square meter in Euro

# Urban district in Munich
rent$area <- as.factor(rent$area)

# Decade of construction
rent$year <- as.factor(floor(rent$year / 10) * 10)

# Number of rooms
rent$rooms <- as.factor(rent$rooms)

# Quality of the house with levels "fair", "good" and "excellent"
rent$quality <- as.factor(rent$good + 2 * rent$best)
levels(rent$quality) <- c("fair", "good", "excellent")

# Floor space divided in categories (0, 30), [30, 40), ...,  [130, 140)
sizeClasses <- c(0, seq(30, 140, 10))
rent$size <- as.factor(sizeClasses[findInterval(rent$size, sizeClasses)])

# Is warm water present?
rent$warm <- factor(rent$warm, labels = c("yes", "no"))

# Is central heating present?
rent$central <- factor(rent$central, labels = c("yes", "no"))

# Does the bathroom have tiles?
rent$tiles <- factor(rent$tiles, labels = c("yes", "no"))

# Is there special furniture in the bathroom?
rent$bathextra <- factor(rent$bathextra, labels = c("no", "yes"))

# Is the kitchen well-equipped?
rent$kitchen <- factor(rent$kitchen, labels = c("no", "yes"))



# Create formula with 'rentm' as response variable,
# 'area' with a Generalized Fused Lasso penalty,
# 'year', 'rooms', 'quality' and 'size' with Fused Lasso penalties 
# where the reference category for 'year' is changed to 2000,
# 'warm' and 'central' are in one group for the Group Lasso penalty,
# 'tiles' and 'bathextra' are not regularized and 
# 'kitchen' has a Lasso penalty
formu <- rentm ~ p(area, pen = "gflasso") + 
  p(year, pen = "flasso", refcat = 2000) + p(rooms, pen = "flasso") + 
  p(quality, pen = "flasso") + p(size, pen = "flasso") +
  p(warm, pen = "grouplasso", group = 1) + p(central, pen = "grouplasso", group = 1) + 
  p(tiles, pen = "none") + bathextra + 
  p(kitchen, pen = "lasso")


# Fit a multi-type regularized GLM using the SMuRF algorithm.
# We use standardization adaptive penalty weights based on an initial GLM fit.
munich.fit <- glmsmurf(formula = formu, family = gaussian(), data = rent, 
                       pen.weights = "glm.stand", lambda = 0.1)

# Model summary
summary(munich.fit) 

[Package smurf version 1.1.5 Index]