Families {gamboostLSS} | R Documentation |
Families for GAMLSS models
Description
The package provides some pre-defined GAMLSS families, e.g.
NBionomialLSS
. Objects of the class families
provide a
convenient way to specify GAMLSS distributions to be fitted by one of
the boosting algorithms implemented in this package. By using the
function Families
, a new object of the class families
can be generated.
Usage
############################################################
# Families for continuous response
# Gaussian distribution
GaussianLSS(mu = NULL, sigma = NULL,
stabilization = c("none", "MAD", "L2"))
# Student's t-distribution
StudentTLSS(mu = NULL, sigma = NULL, df = NULL,
stabilization = c("none", "MAD", "L2"))
############################################################
# Families for continuous non-negative response
# Gamma distribution
GammaLSS(mu = NULL, sigma = NULL,
stabilization = c("none", "MAD", "L2"))
############################################################
# Families for fractions and bounded continuous response
# Beta distribution
BetaLSS(mu = NULL, phi = NULL,
stabilization = c("none", "MAD", "L2"))
############################################################
# Families for count data
# Negative binomial distribution
NBinomialLSS(mu = NULL, sigma = NULL,
stabilization = c("none", "MAD", "L2"))
# Zero-inflated Poisson distribution
ZIPoLSS(mu = NULL, sigma = NULL,
stabilization = c("none", "MAD", "L2"))
# Zero-inflated negative binomial distribution
ZINBLSS(mu = NULL, sigma = NULL, nu = NULL,
stabilization = c("none", "MAD", "L2"))
############################################################
# Families for survival models (accelerated failure time
# models) for data with right censoring
# Log-normal distribution
LogNormalLSS(mu = NULL, sigma = NULL,
stabilization = c("none", "MAD", "L2"))
# Log-logistic distribution
LogLogLSS(mu = NULL, sigma = NULL,
stabilization = c("none", "MAD", "L2"))
# Weibull distribution
WeibullLSS(mu = NULL, sigma = NULL,
stabilization = c("none", "MAD", "L2"))
############################################################
# Constructor function for new GAMLSS distributions
Families(..., qfun = NULL, name = NULL)
Arguments
... |
sub-families to be passed to constructor. |
qfun |
quantile function. This function can for example be used
to compute (marginal) prediction intervals. See
|
name |
name of the families. |
mu |
offset value for mu. |
sigma |
offset value for sigma. |
phi |
offset value for phi. |
df |
offset value for df. |
nu |
offset value for nu. |
stabilization |
governs if the negative gradient should be standardized in each boosting step. It can be either "none", "MAD" or "L2". See also Details below. |
Details
The arguments of the families are the offsets for each distribution
parameter. Offsets can be either scalar, a vector with length equal to
the number of observations or NULL
(default). In the latter
case, a scalar offset for this component is computed by minimizing the
risk function w.r.t. the corresponding distribution parameter (keeping
the other parameters fixed).
Note that gamboostLSS
is not restricted to three components but
can handle an arbitrary number of components (which, of course,
depends on the GAMLSS distribution). However, it is important that the
names (for the offsets, in the sub-families etc.) are chosen
consistently.
The ZIPoLSS
families can be used to fit zero-inflated Poisson
models. Here, mu
and sigma
refer to the location
parameter of the Poisson component (with log link) and the mean of the
zero-generating process (with logit link), respectively.
Similarly, ZINBLSS
can be used to fit zero-inflated negative
binomial models. Here, mu
and sigma
refer to the
location and scale parameters (with log link) of the negative binomial
component of the model. The zero-generating process (with logit link)
is represented by nu
.
The Families
function can be used to implements a new GAMLSS
distribution which can be used for fitting by mboostLSS
.
Thereby, the function builds a list of sub-families, one for each
distribution parameter. The sub-families themselves are objects of the
class boost_family
, and can be constructed via the function
Family
of the mboost
Package.
Arguments to be passed to Family
: The loss
for every
distribution parameter (contained in objects of class
boost_family
) is the negative log-likelihood of the
corresponding distribution. The ngradient
is the negative
partial derivative of the loss function with respect to the
distribution parameter. For a two-parameter distribution (e.g. mu and
sigma), the user therefore has to specify two sub-families with
Family
. The loss
is basically the same function
for both paramters, only ngradient
differs. Both sub-families
are passed to the Families
constructor, which returns an object
of the class families
.
To (potentially) stabilize the model estimation by standardizing the
negative gradients one can use the argument stabilization
of
the families. If stabilization = "MAD"
, the negative gradient
is divided by its (weighted) median absolute deviation
median_i
(|u_{k,i} - median_j(u_{k,j})|)
in each boosting step. See Hofner et
al. (2016) for details. An alternative is stabilization = "L2"
, where the gradient is divided by its (weighted) mean L2 norm. This results in negative gradient vectors (and hence also updates) of similar size for each distribution parameter, but also for every boosting iteration.
Value
An object of class families
.
Author(s)
BetaLSS
for boosting beta regression was implmented by Florian
Wickler.
References
B. Hofner, A. Mayr, M. Schmid (2016). gamboostLSS: An R Package for Model Building and Variable Selection in the GAMLSS Framework. Journal of Statistical Software, 74(1), 1-31.
Available as vignette("gamboostLSS_Tutorial")
.
Mayr, A., Fenske, N., Hofner, B., Kneib, T. and Schmid, M. (2012): Generalized additive models for location, scale and shape for high-dimensional data - a flexible approach based on boosting. Journal of the Royal Statistical Society, Series C (Applied Statistics) 61(3): 403-427.
Rigby, R. A. and D. M. Stasinopoulos (2005). Generalized additive models for location, scale and shape (with discussion). Journal of the Royal Statistical Society, Series C (Applied Statistics), 54, 507-554.
See Also
as.families
for applying GAMLSS distributions provided
in the framework of the gamlss
package.
The functions gamboostLSS
and glmboostLSS
can be used for model fitting.
See also the corresponding constructor function
Family
in mboost
.
Examples
## Example to define a new distribution:
## Students t-distribution with two parameters, df and mu:
## sub-Family for mu
## -> generate object of the class family from the package mboost
newStudentTMu <- function(mu, df){
# loss is negative log-Likelihood, f is the parameter to be fitted with
# id link -> f = mu
loss <- function(df, y, f) {
-1 * (lgamma((df + 1)/2) - lgamma(1/2) -
lgamma(df/2) - 0.5 * log(df) -
(df + 1)/2 * log(1 + (y - f)^2/(df )))
}
# risk is sum of loss
risk <- function(y, f, w = 1) {
sum(w * loss(y = y, f = f, df = df))
}
# ngradient is the negative derivate w.r.t. mu (=f)
ngradient <- function(y, f, w = 1) {
(df + 1) * (y - f)/(df + (y - f)^2)
}
# use the Family constructor of mboost
mboost::Family(ngradient = ngradient, risk = risk, loss = loss,
response = function(f) f,
name = "new Student's t-distribution: mu (id link)")
}
## sub-Family for df
newStudentTDf <- function(mu, df){
# loss is negative log-Likelihood, f is the parameter to be fitted with
# log-link: exp(f) = df
loss <- function( mu, y, f) {
-1 * (lgamma((exp(f) + 1)/2) - lgamma(1/2) -
lgamma(exp(f)/2) - 0.5 * f -
(exp(f) + 1)/2 * log(1 + (y - mu)^2/(exp(f) )))
}
# risk is sum of loss
risk <- function(y, f, w = 1) {
sum(w * loss(y = y, f = f, mu = mu))
}
# ngradient is the negative derivate of the loss w.r.t. f
# in this case, just the derivative of the log-likelihood
ngradient <- function(y, f, w = 1) {
exp(f)/2 * (digamma((exp(f) + 1)/2) - digamma(exp(f)/2)) -
0.5 - (exp(f)/2 * log(1 + (y - mu)^2 / (exp(f) )) -
(y - mu)^2 / (1 + (y - mu)^2 / exp(f)) * (exp(-f) + 1)/2)
}
# use the Family constructor of mboost
mboost::Family(ngradient = ngradient, risk = risk, loss = loss,
response = function(f) exp(f),
name = "Student's t-distribution: df (log link)")
}
## families object for new distribution
newStudentT <- Families(mu= newStudentTMu(mu=mu, df=df),
df=newStudentTDf(mu=mu, df=df))
### Do not test the following code per default on CRAN as it takes some time to run:
### usage of the new Student's t distribution:
library(gamlss) ## required for rTF
set.seed(1907)
n <- 5000
x1 <- runif(n)
x2 <- runif(n)
mu <- 2 -1*x1 - 3*x2
df <- exp(1 + 0.5*x1 )
y <- rTF(n = n, mu = mu, nu = df)
## model fitting
model <- glmboostLSS(y ~ x1 + x2, families = newStudentT,
control = boost_control(mstop = 100),
center = TRUE)
## shrinked effect estimates
coef(model, off2int = TRUE)
## compare to pre-defined three parametric t-distribution:
model2 <- glmboostLSS(y ~ x1 + x2, families = StudentTLSS(),
control = boost_control(mstop = 100),
center = TRUE)
coef(model2, off2int = TRUE)
## with effect on sigma:
sigma <- 3+ 1*x2
y <- rTF(n = n, mu = mu, nu = df, sigma=sigma)
model3 <- glmboostLSS(y ~ x1 + x2, families = StudentTLSS(),
control = boost_control(mstop = 100),
center = TRUE)
coef(model3, off2int = TRUE)