| family.mgcv {mgcv} | R Documentation |
Distribution families in mgcv
Description
As well as the standard families (of class family) documented in family (see also glm) which can be used with functions gam, bam and gamm, mgcv also supplies some extra families, most of which are currently only usable with gam, although some can also be used with bam. These are described here.
Details
The following families (class family) are in the exponential family given the value of a single parameter. They are usable with all modelling functions.
-
TweedieAn exponential family distribution for which the variance of the response is given by the mean response to the powerp.pis in (1,2) and must be supplied. Alternatively, seetwto estimatep(gam/bamonly). -
negbinThe negative binomial. Alternatively seenbto estimate thethetaparameter of the negative binomial (gam/bamonly).
The following families (class extended.family) are for regression type models dependent on a single linear predictor, and with a log likelihood
which is a sum of independent terms, each corresponding to a single response observation. Usable with gam, with smoothing parameter estimation by "NCV", "REML" or "ML" (the latter does not integrate the unpenalized and parameteric effects out of the marginal likelihood optimized for the smoothing parameters). Also usable with bam.
-
betarfor proportions data on (0,1) when the binomial is not appropriate. -
cnormcensored normal distribution, for log normal accelerated failure time models, Tobit regression and rounded data, for example. -
nbfor negative binomial data when thethetaparameter is to be estimated. -
ocatfor ordered categorical data. -
scatscaled t for heavy tailed data that would otherwise be modelled as Gaussian. -
twfor Tweedie distributed data, when the power parameter relating the variance to the mean is to be estimated. -
ziPfor zero inflated Poisson data, when the zero inflation rate depends simply on the Poisson mean.
The above families of class family and extended.family can be combined to model data where different response observations come from different distributions. For example, when modelling the combination of presence-absence and abundance data, binomial and nb families might be used.
-
gfamcreates a 'grouped family' (or 'family group') from a list of families. The response is supplied as a two column matrix, the first containing the response observations, and the second an index of the family to which each observation relates.
The following families (class general.family) implement more general model classes. Usable only with gam and only with REML or NCV smoothing parameter estimation.
-
cox.phthe Cox Proportional Hazards model for survival data (no NCV). -
gammalsa gamma location-scale model, where the mean and standared deviation are modelled with separate linear predictors. -
gaulssa Gaussian location-scale model where the mean and the standard deviation are both modelled using smooth linear predictors. -
gevlssa generalized extreme value (GEV) model where the location, scale and shape parameters are each modelled using a linear predictor. -
gumblsa Gumbel location-scale model (2 linear predictors). -
multinom: multinomial logistic regression, for unordered categorical responses. -
mvn: multivariate normal additive models (no NCV). -
shashSinh-arcsinh location scale and shape model family (4 linear predicors). -
twlssTweedie location scale and variance power model family (3 linear predicors). Can only be fitted using EFS method. -
ziplssa ‘two-stage’ zero inflated Poisson model, in which 'potential-presence' is modelled with one linear predictor, and Poisson mean abundance given potential presence is modelled with a second linear predictor.
Author(s)
Simon N. Wood (s.wood@r-project.org) & Natalya Pya
References
Wood, S.N., N. Pya and B. Saefken (2016), Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association 111, 1548-1575 doi:10.1080/01621459.2016.1180986