glmmTMB {glmmTMB} | R Documentation |
Fit Models with TMB
Description
Fit a generalized linear mixed model (GLMM) using Template Model Builder (TMB).
Usage
glmmTMB(
formula,
data = NULL,
family = gaussian(),
ziformula = ~0,
dispformula = ~1,
weights = NULL,
offset = NULL,
contrasts = NULL,
na.action,
se = TRUE,
verbose = FALSE,
doFit = TRUE,
control = glmmTMBControl(),
REML = FALSE,
start = NULL,
map = NULL,
sparseX = NULL,
priors = NULL
)
Arguments
formula |
combined fixed and random effects formula, following lme4 syntax. |
data |
data frame (tibbles are OK) containing model variables. Not required, but strongly recommended; if |
family |
a family function, a character string naming a family function, or the result of a call to a family function (variance/link function) information. See |
ziformula |
a one-sided (i.e., no response variable) formula for zero-inflation combining fixed and random effects: the default |
dispformula |
a one-sided formula for dispersion containing only fixed effects: the default |
weights |
weights, as in |
offset |
offset for conditional model (only). |
contrasts |
an optional list, e.g., |
na.action |
a function that specifies how to handle observations
containing |
se |
whether to return standard errors. |
verbose |
whether progress indication should be printed to the console. |
doFit |
whether to fit the full model, or (if FALSE) return the preprocessed data and parameter objects, without fitting the model. |
control |
control parameters, see |
REML |
whether to use REML estimation rather than maximum likelihood. |
start |
starting values, expressed as a list with possible components |
map |
a list specifying which parameter values should be fixed to a constant value rather than estimated. |
sparseX |
a named logical vector containing (possibly) elements named "cond", "zi", "disp" to indicate whether fixed-effect model matrices for particular model components should be generated as sparse matrices, e.g. |
priors |
a data frame of priors, in a similar format to that accepted by the |
Details
Binomial models with more than one trial (i.e., not binary/Bernoulli) can either be specified in the form
prob ~ ..., weights = N
, or in the more typical two-column matrixcbind(successes,failures)~...
form.Behavior of
REML=TRUE
for Gaussian responses matcheslme4::lmer
. It may also be useful in some cases with non-Gaussian responses (Millar 2011). Simulations should be done first to verify.Because the
df.residual
method forglmmTMB
currently counts the dispersion parameter, users should multiply this value bysqrt(nobs(fit) / (1+df.residual(fit)))
when comparing withlm
.Although models can be fitted without specifying a
data
argument, its use is strongly recommended; drawing model components from the global environment, or usingdf$var
notation within model formulae, can lead to confusing (and sometimes hard-to-detect) errors.By default, vector-valued random effects are fitted with unstructured (general symmetric positive definite) variance-covariance matrices. Structured variance-covariance matrices can be specified in the form
struc(terms|group)
, wherestruc
is one of-
diag
(diagonal, heterogeneous variance) -
ar1
(autoregressive order-1, homogeneous variance) -
cs
(compound symmetric, heterogeneous variance) -
ou
(* Ornstein-Uhlenbeck, homogeneous variance) -
exp
(* exponential autocorrelation) -
gau
(* Gaussian autocorrelation) -
mat
(* Matérn process correlation) -
toep
(* Toeplitz) -
rr
(reduced rank/factor-analytic model) -
homdiag
(diagonal, homogeneous variance)
Structures marked with * are experimental/untested. See
vignette("covstruct", package = "glmmTMB")
for more information.-
For backward compatibility, the
family
argument can also be specified as a list comprising the name of the distribution and the link function (e.g.list(family="binomial", link="logit")
). However, this alternative is now deprecated; it produces a warning and will be removed at some point in the future. Furthermore, certain capabilities such as Pearson residuals or predictions on the data scale will only be possible if components such asvariance
andlinkfun
are present, seefamily
.Smooths taken from the
mgcv
package can be included inglmmTMB
formulas usings
; these terms will appear as additional components in both the fixed and the random-effects terms. This functionality is experimental for now. We recommend usingREML=TRUE
. Sees
for details of specifying smooths (andsmooth2random
and the appendix of Wood (2004) for technical details).
Note
For more information about the glmmTMB package, see Brooks et al. (2017) and the vignette(package="glmmTMB")
collection. For the underlying TMB package that performs the model estimation, see Kristensen et al. (2016).
References
Brooks, M. E., Kristensen, K., van Benthem, K. J., Magnusson, A., Berg, C. W., Nielsen, A., Skaug, H. J., Mächler, M. and Bolker, B. M. (2017). glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. The R Journal, 9(2), 378–400.
Kristensen, K., Nielsen, A., Berg, C. W., Skaug, H. and Bell, B. (2016). TMB: Automatic differentiation and Laplace approximation. Journal of Statistical Software, 70, 1–21.
Millar, R. B. (2011). Maximum Likelihood Estimation and Inference: With Examples in R, SAS and ADMB. Wiley, New York. Wood, S. N. (2004) Stable and Efficient Multiple Smoothing Parameter Estimation for Generalized Additive Models. Journal of the American Statistical Association 99(467): 673–86. doi:10.1198/016214504000000980
Examples
(m1 <- glmmTMB(count ~ mined + (1|site),
zi=~mined,
family=poisson, data=Salamanders))
summary(m1)
##' ## Zero-inflated negative binomial model
(m2 <- glmmTMB(count ~ spp + mined + (1|site),
zi=~spp + mined,
family=nbinom2, data=Salamanders))
## Hurdle Poisson model
(m3 <- glmmTMB(count ~ spp + mined + (1|site),
zi=~spp + mined,
family=truncated_poisson, data=Salamanders))
## Binomial model
data(cbpp, package="lme4")
(bovine <- glmmTMB(cbind(incidence, size-incidence) ~ period + (1|herd),
family=binomial, data=cbpp))
## Dispersion model
sim1 <- function(nfac=40, nt=100, facsd=0.1, tsd=0.15, mu=0, residsd=1)
{
dat <- expand.grid(fac=factor(letters[1:nfac]), t=1:nt)
n <- nrow(dat)
dat$REfac <- rnorm(nfac, sd=facsd)[dat$fac]
dat$REt <- rnorm(nt, sd=tsd)[dat$t]
dat$x <- rnorm(n, mean=mu, sd=residsd) + dat$REfac + dat$REt
dat
}
set.seed(101)
d1 <- sim1(mu=100, residsd=10)
d2 <- sim1(mu=200, residsd=5)
d1$sd <- "ten"
d2$sd <- "five"
dat <- rbind(d1, d2)
m0 <- glmmTMB(x ~ sd + (1|t), dispformula=~sd, data=dat)
fixef(m0)$disp
c(log(5), log(10)-log(5)) # expected dispersion model coefficients
## Using 'map' to fix random-effects SD to 10
m1_map <- update(m1, map=list(theta=factor(NA)),
start=list(theta=log(10)))
VarCorr(m1_map)
## smooth terms
data("Nile")
ndat <- data.frame(time = c(time(Nile)), val = c(Nile))
sm1 <- glmmTMB(val ~ s(time), data = ndat,
REML = TRUE, start = list(theta = 5))
plot(val ~ time, data = ndat)
lines(ndat$time, predict(sm1))