sdmTMB {sdmTMB} | R Documentation |
Fit a spatial or spatiotemporal GLMM with TMB
Description
Fit a spatial or spatiotemporal generalized linear mixed effects model (GLMM) with the TMB (Template Model Builder) R package and the SPDE (stochastic partial differential equation) approximation to Gaussian random fields.
Usage
sdmTMB(
formula,
data,
mesh,
time = NULL,
family = gaussian(link = "identity"),
spatial = c("on", "off"),
spatiotemporal = c("iid", "ar1", "rw", "off"),
share_range = TRUE,
time_varying = NULL,
time_varying_type = c("rw", "rw0", "ar1"),
spatial_varying = NULL,
weights = NULL,
offset = NULL,
extra_time = NULL,
reml = FALSE,
silent = TRUE,
anisotropy = FALSE,
control = sdmTMBcontrol(),
priors = sdmTMBpriors(),
knots = NULL,
bayesian = FALSE,
previous_fit = NULL,
do_fit = TRUE,
do_index = FALSE,
predict_args = NULL,
index_args = NULL,
experimental = NULL
)
Arguments
formula |
Model formula. IID random intercepts are possible using
lme4 syntax, e.g., |
data |
A data frame. |
mesh |
An object from |
time |
An optional time column name (as character). Can be left as
|
family |
The family and link. Supports |
spatial |
Estimate spatial random fields? Options are |
spatiotemporal |
Estimate the spatiotemporal random fields as |
share_range |
Logical: estimate a shared spatial and spatiotemporal
range parameter ( |
time_varying |
An optional one-sided formula describing covariates
that should be modelled as a time-varying process. Set the type of
process with |
time_varying_type |
Type of time-varying process to apply to
|
spatial_varying |
An optional one-sided formula of coefficients that
should vary in space as random fields. Note that you likely want to include
a fixed effect for the same variable to improve interpretability since the
random field is assumed to have a mean of 0. If a (scaled) time column is
used, it will represent a local-time-trend model. See
doi:10.1111/ecog.05176 and the spatial trends vignette.
Note this predictor should usually be centered to have mean zero and have a
standard deviation of approximately 1.
The spatial intercept is controlled by the |
weights |
A numeric vector representing optional likelihood weights for
the conditional model. Implemented as in glmmTMB: weights do not have
to sum to one and are not internally modified. Can also be used for trials
with the binomial family; the |
offset |
A numeric vector representing the model offset or a character value representing the column name of the offset. In delta/hurdle models, this applies only to the positive component. Usually a log transformed variable. |
extra_time |
Optional extra time slices (e.g., years) to include for interpolation or forecasting with the predict function. See the Details section below. |
reml |
Logical: use REML (restricted maximum likelihood) estimation rather than maximum likelihood? Internally, this adds the fixed effects to the list of random effects to integrate over. |
silent |
Silent or include optimization details? Helpful to set to
|
anisotropy |
Logical: allow for anisotropy (spatial correlation that is
directionally dependent)? See |
control |
Optimization control options via |
priors |
Optional penalties/priors via |
knots |
Optional named list containing knot values to be used for basis
construction of smoothing terms. See |
bayesian |
Logical indicating if the model will be passed to
tmbstan. If |
previous_fit |
A previously fitted sdmTMB model to initialize the
optimization with. Can greatly speed up fitting. Note that the model must
be set up exactly the same way. However, the data and |
do_fit |
Fit the model ( |
do_index |
Do index standardization calculations while fitting? Saves
memory and time when working with large datasets or projection grids since
the TMB object doesn't have to be rebuilt with |
predict_args |
A list of arguments to pass to |
index_args |
A list of arguments to pass to |
experimental |
A named list for esoteric or in-development options. Here be dragons. |
Details
Model description
See the model description vignette or the relevant appendix of the preprint on sdmTMB: doi:10.1101/2022.03.24.485545
Binomial families
Following the structure of stats::glm()
and glmmTMB, a binomial
family can be specified in one of 4 ways: (1) the response may be a factor
(and the model classifies the first level versus all others), (2) the
response may be binomial (0/1), (3) the response can be a matrix of form
cbind(success, failure)
, and (4) the response may be the observed
proportions, and the 'weights' argument is used to specify the Binomial size
(N) parameter (prob ~ ..., weights = N
).
Smooth terms
Smooth terms can be included following GAMs (generalized additive models)
using + s(x)
, which implements a smooth from mgcv::s()
. sdmTMB uses
penalized smooths, constructed via mgcv::smooth2random()
. This is a similar
approach implemented in gamm4 and brms, among other packages.
Within these smooths, the same syntax commonly used in mgcv::s()
or
mgcv::t2()
can be applied, e.g. 2-dimensional smooths may be constructed
with + s(x, y)
or + t2(x, y)
; smooths can be specific to various factor
levels, + s(x, by = group)
; the basis function dimensions may be specified,
e.g. + s(x, k = 4)
; and various types of splines may be constructed such as
cyclic splines to model seasonality (perhaps with the knots
argument also
be supplied).
Threshold models
A linear break-point relationship for a covariate can be included via
+ breakpt(variable)
in the formula, where variable
is a single covariate
corresponding to a column in data
. In this case, the relationship is linear
up to a point and then constant (hockey-stick shaped).
Similarly, a logistic-function threshold model can be included via
+ logistic(variable)
. This option models the relationship as a logistic
function of the 50% and 95% values. This is similar to length- or size-based
selectivity in fisheries, and is parameterized by the points at which f(x) =
0.5 or 0.95. See the
threshold vignette.
Note that only a single threshold covariate can be included and the same covariate is included in both components for the delta families.
Extra time: forecasting or interpolating
Extra time slices (e.g., years) can be included for interpolation or
forecasting with the predict function via the extra_time
argument. The
predict function requires all time slices to be defined when fitting the
model to ensure the various time indices are set up correctly. Be careful if
including extra time slices that the model remains identifiable. For example,
including + as.factor(year)
in formula
will render a model with no data
to inform the expected value in a missing year. sdmTMB()
makes no attempt
to determine if the model makes sense for forecasting or interpolation. The
options time_varying
, spatiotemporal = "rw"
, spatiotemporal = "ar1"
,
or a smoother on the time column provide mechanisms to predict over missing
time slices with process error.
extra_time
can also be used to fill in missing time steps for the purposes
of a random walk or AR(1) process if the gaps between time steps are uneven.
extra_time
can include only extra time steps or all time steps including
those found in the fitted data. This latter option may be simpler.
Regularization and priors
You can achieve regularization via penalties (priors) on the fixed effect
parameters. See sdmTMBpriors()
. You can fit the model once without
penalties and look at the output of print(your_model)
or tidy(your_model)
or fit the model with do_fit = FALSE
and inspect
head(your_model$tmb_data$X_ij[[1]])
if you want to see how the formula is
translated to the fixed effect model matrix. Also see the
Bayesian vignette.
Delta/hurdle models
Delta models (also known as hurdle models) can be fit as two separate models
or at the same time by using an appropriate delta family. E.g.:
delta_gamma()
,
delta_beta()
,
delta_lognormal()
, and
delta_truncated_nbinom2()
.
If fit with a delta family, by default the formula, spatial, and spatiotemporal
components are shared. Some elements can be specified independently for the two models
using a list format. These include formula
, spatial
, spatiotemporal
,
and share_range
. The first element of the list is for the binomial component
and the second element is for the positive component (e.g., Gamma).
Other elements must be shared for now (e.g., spatially varying coefficients,
time-varying coefficients). Furthermore, there are currently limitations if
specifying two formulas as a list: the two formulas cannot have smoothers,
threshold effects, or random intercepts. For now, these must be specified
through a single formula that is shared across the two models.
The main advantage of specifying such models using a delta family (compared
to fitting two separate models) is (1) coding simplicity and (2) calculation
of uncertainty on derived quantities such as an index of abundance with
get_index()
using the generalized delta method within TMB. Also, selected
parameters can be shared across the models.
See the delta-model vignette.
Index standardization
For index standardization, you may wish to include 0 + as.factor(year)
(or whatever the time column is called) in the formula. See a basic
example of index standardization in the relevant
package vignette.
You will need to specify the time
argument. See get_index()
.
Value
An object (list) of class sdmTMB
. Useful elements include:
-
sd_report
: output fromTMB::sdreport()
-
gradients
: marginal log likelihood gradients with respect to each fixed effect -
model
: output fromstats::nlminb()
-
data
: the fitted data -
mesh
: the object that was supplied to themesh
argument -
family
: the family object, which includes the inverse link function asfamily$linkinv()
-
tmb_params
: The parameters list passed toTMB::MakeADFun()
-
tmb_map
: The 'map' list passed toTMB::MakeADFun()
-
tmb_data
: The data list passed toTMB::MakeADFun()
-
tmb_obj
: The TMB object created byTMB::MakeADFun()
References
Main reference introducing the package to cite when using sdmTMB:
Anderson, S.C., E.J. Ward, P.A. English, L.A.K. Barnett. 2022. sdmTMB: an R package for fast, flexible, and user-friendly generalized linear mixed effects models with spatial and spatiotemporal random fields. bioRxiv 2022.03.24.485545; doi:10.1101/2022.03.24.485545.
Reference for local trends:
Barnett, L.A.K., E.J. Ward, S.C. Anderson. 2021. Improving estimates of species distribution change by incorporating local trends. Ecography. 44(3):427-439. doi:10.1111/ecog.05176.
Further explanation of the model and application to calculating climate velocities:
English, P., E.J. Ward, C.N. Rooper, R.E. Forrest, L.A. Rogers, K.L. Hunter, A.M. Edwards, B.M. Connors, S.C. Anderson. 2021. Contrasting climate velocity impacts in warm and cool locations show that effects of marine warming are worse in already warmer temperate waters. Fish and Fisheries. 23(1) 239-255. doi:10.1111/faf.12613.
Discussion of and illustration of some decision points when fitting these models:
Commander, C.J.C., L.A.K. Barnett, E.J. Ward, S.C. Anderson, T.E. Essington. 2022. The shadow model: how and why small choices in spatially explicit species distribution models affect predictions. PeerJ 10: e12783. doi:10.7717/peerj.12783.
Application and description of threshold/break-point models:
Essington, T.E., S.C. Anderson, L.A.K. Barnett, H.M. Berger, S.A. Siedlecki, E.J. Ward. 2022. Advancing statistical models to reveal the effect of dissolved oxygen on the spatial distribution of marine taxa using thresholds and a physiologically based index. Ecography. 2022: e06249 doi:10.1111/ecog.06249.
Application to fish body condition:
Lindmark, M., S.C. Anderson, M. Gogina, M. Casini. Evaluating drivers of spatiotemporal individual condition of a bottom-associated marine fish. bioRxiv 2022.04.19.488709. doi:10.1101/2022.04.19.488709.
Several sections of the original TMB model code were adapted from the VAST R package:
Thorson, J.T. 2019. Guidance for decisions using the Vector Autoregressive Spatio-Temporal (VAST) package in stock, ecosystem, habitat and climate assessments. Fish. Res. 210:143–161. doi:10.1016/j.fishres.2018.10.013.
Code for the family
R-to-TMB implementation, selected parameterizations of
the observation likelihoods, general package structure inspiration, and the
idea behind the TMB prediction approach were adapted from the glmmTMB R
package:
Brooks, M.E., K. Kristensen, K.J. van Benthem, A. Magnusson, C.W. Berg, A. Nielsen, H.J. Skaug, M. Maechler, B.M. Bolker. 2017. glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal, 9(2):378-400. doi:10.32614/rj-2017-066.
Implementation of geometric anisotropy with the SPDE and use of random field GLMMs for index standardization:
Thorson, J.T., A.O. Shelton, E.J. Ward, H.J. Skaug. 2015. Geostatistical delta-generalized linear mixed models improve precision for estimated abundance indices for West Coast groundfishes. ICES J. Mar. Sci. 72(5): 1297–1310. doi:10.1093/icesjms/fsu243.
Examples
library(sdmTMB)
# Build a mesh to implement the SPDE approach:
mesh <- make_mesh(pcod_2011, c("X", "Y"), cutoff = 20)
# - this example uses a fairly coarse mesh so these examples run quickly
# - 'cutoff' is the minimum distance between mesh vertices in units of the
# x and y coordinates
# - 'cutoff = 10' might make more sense in applied situations for this dataset
# - or build any mesh in 'fmesher' and pass it to the 'mesh' argument in make_mesh()`
# - the mesh is not needed if you will be turning off all
# spatial/spatiotemporal random fields
# Quick mesh plot:
plot(mesh)
# Fit a Tweedie spatial random field GLMM with a smoother for depth:
fit <- sdmTMB(
density ~ s(depth),
data = pcod_2011, mesh = mesh,
family = tweedie(link = "log")
)
fit
# Extract coefficients:
tidy(fit, conf.int = TRUE)
tidy(fit, effects = "ran_par", conf.int = TRUE)
# Perform several 'sanity' checks:
sanity(fit)
# Predict on the fitted data; see ?predict.sdmTMB
p <- predict(fit)
# Predict on new data:
p <- predict(fit, newdata = qcs_grid)
head(p)
# Visualize the depth effect with ggeffects:
ggeffects::ggpredict(fit, "depth [all]") |> plot()
# Visualize depth effect with visreg: (see ?visreg_delta)
visreg::visreg(fit, xvar = "depth") # link space; randomized quantile residuals
visreg::visreg(fit, xvar = "depth", scale = "response")
visreg::visreg(fit, xvar = "depth", scale = "response", gg = TRUE, rug = FALSE)
# Add spatiotemporal random fields:
fit <- sdmTMB(
density ~ 0 + as.factor(year),
time = "year", #<
data = pcod_2011, mesh = mesh,
family = tweedie(link = "log")
)
fit
# Make the fields AR1:
fit <- sdmTMB(
density ~ s(depth),
time = "year",
spatial = "off",
spatiotemporal = "ar1", #<
data = pcod_2011, mesh = mesh,
family = tweedie(link = "log")
)
fit
# Make the fields a random walk:
fit <- sdmTMB(
density ~ s(depth),
time = "year",
spatial = "off",
spatiotemporal = "rw", #<
data = pcod_2011, mesh = mesh,
family = tweedie(link = "log")
)
fit
# Depth smoothers by year:
fit <- sdmTMB(
density ~ s(depth, by = as.factor(year)), #<
time = "year",
spatial = "off",
spatiotemporal = "rw",
data = pcod_2011, mesh = mesh,
family = tweedie(link = "log")
)
fit
# 2D depth-year smoother:
fit <- sdmTMB(
density ~ s(depth, year), #<
spatial = "off",
data = pcod_2011, mesh = mesh,
family = tweedie(link = "log")
)
fit
# Turn off spatial random fields:
fit <- sdmTMB(
present ~ poly(log(depth)),
spatial = "off", #<
data = pcod_2011, mesh = mesh,
family = binomial()
)
fit
# Which, matches glm():
fit_glm <- glm(
present ~ poly(log(depth)),
data = pcod_2011,
family = binomial()
)
summary(fit_glm)
AIC(fit, fit_glm)
# Delta/hurdle binomial-Gamma model:
fit_dg <- sdmTMB(
density ~ poly(log(depth), 2),
data = pcod_2011, mesh = mesh,
spatial = "off",
family = delta_gamma() #<
)
fit_dg
# Delta model with different formulas and spatial structure:
fit_dg <- sdmTMB(
list(density ~ depth_scaled, density ~ poly(depth_scaled, 2)), #<
data = pcod_2011, mesh = mesh,
spatial = list("off", "on"), #<
family = delta_gamma()
)
fit_dg
# Delta/hurdle truncated NB2:
pcod_2011$count <- round(pcod_2011$density)
fit_nb2 <- sdmTMB(
count ~ s(depth),
data = pcod_2011, mesh = mesh,
spatial = "off",
family = delta_truncated_nbinom2() #<
)
fit_nb2
# Regular NB2:
fit_nb2 <- sdmTMB(
count ~ s(depth),
data = pcod_2011, mesh = mesh,
spatial = "off",
family = nbinom2() #<
)
fit_nb2
# IID random intercepts by year:
pcod_2011$fyear <- as.factor(pcod_2011$year)
fit <- sdmTMB(
density ~ s(depth) + (1 | fyear), #<
data = pcod_2011, mesh = mesh,
family = tweedie(link = "log")
)
fit
# Spatially varying coefficient of year:
pcod_2011$year_scaled <- as.numeric(scale(pcod_2011$year))
fit <- sdmTMB(
density ~ year_scaled,
spatial_varying = ~ 0 + year_scaled, #<
data = pcod_2011, mesh = mesh, family = tweedie(), time = "year"
)
fit
# Time-varying effects of depth and depth squared:
fit <- sdmTMB(
density ~ 0 + as.factor(year),
time_varying = ~ 0 + depth_scaled + depth_scaled2, #<
data = pcod_2011, time = "year", mesh = mesh,
family = tweedie()
)
print(fit)
# Extract values:
est <- as.list(fit$sd_report, "Estimate")
se <- as.list(fit$sd_report, "Std. Error")
est$b_rw_t[, , 1]
se$b_rw_t[, , 1]
# Linear break-point effect of depth:
fit <- sdmTMB(
density ~ breakpt(depth_scaled), #<
data = pcod_2011,
mesh = mesh,
family = tweedie()
)
fit