svcTMsPGOcc {spOccupancy} | R Documentation |
Function for Fitting Multi-Species Multi-Season Spatially-Varying Coefficient Occupancy Models
Description
The function svcTMsPGOcc
fits multi-species multi-season spatially-varying coefficient occupancy models with species correlations (i.e., a spatially-explicit joint species distribution model with imperfect detection). We use Polya-Gamma latent variables and a spatial factor modeling approach. Models are implemented using a Nearest Neighbor Gaussian Process.
Usage
svcTMsPGOcc(occ.formula, det.formula, data, inits, priors, tuning,
svc.cols = 1, cov.model = 'exponential', NNGP = TRUE,
n.neighbors = 15, search.type = 'cb', std.by.sp = FALSE,
n.factors, svc.by.sp, n.batch, batch.length,
accept.rate = 0.43, n.omp.threads = 1,
verbose = TRUE, ar1 = FALSE, n.report = 100,
n.burn = round(.10 * n.batch * batch.length), n.thin = 1,
n.chains = 1, ...)
Arguments
occ.formula |
a symbolic description of the model to be fit
for the occurrence portion of the model using R's model syntax. Random
intercepts are allowed using lme4 syntax (Bates et al. 2015). Only
right-hand side of formula is specified. See example below.
|
det.formula |
a symbolic description of the model to be fit
for the detection portion of the model using R's model syntax. Only
right-hand side of formula is specified. See example below. Random
intercepts are allowed using lme4 syntax (Bates et al. 2015).
|
data |
a list containing data necessary for model fitting.
Valid tags are y , occ.covs , det.covs ,
coords , range.ind , and grid.index .
y is a four-dimensional array with first dimension equal to the
number of species, second dimension equal to the number of sites, third
dimension equal to the number of primary time periods, and
fourth dimension equal to the maximum number of secondary replicates at a given site.
occ.covs is a list of variables included in the occurrence portion of the model. Each
list element is a different occurrence covariate, which can be site level
or site/primary time period level. Site-level covariates are specified as a vector of
length J while site/primary time period level covariates are specified as a matrix
with rows corresponding to sites and columns correspond to primary time periods.
Similarly, det.covs is a list of variables included in the detection
portion of the model, with each list element corresponding to an individual variable.
In addition to site-level and/or site/primary time period-level,
detection covariates can also be observational-level. Observation-level covariates
are specified as a three-dimensional array with first dimension corresponding to
sites, second dimension corresponding to primary time period, and third
dimension corresponding to replicate. coords is a matrix of the observation coordinates used
to estimate the SVCs for each site. coords has two columns for the
easting and northing coordinate, respectively. Typically, each site in the data
set will have it's own coordinate, such that coords is a J \times 2
matrix and grid.index should not be specified.
If you desire to estimate SVCs at some larger spatial level,
e.g., if points fall within grid cells and you want to estimate an SVC for
each grid cell instead of each point, coords can be specified as the coordinate for
each grid cell. In such a case, grid.index is an indexing vector of length J, where each
value of grid.index indicates the corresponding row in coords that the given
site corresponds to. Note that spOccupancy assumes coordinates are specified
in a projected coordinate system.
range.ind is a matrix with rows corresponding to species and columns
corresponding to sites, with each element taking value 1 if that site is
within the range of the corresponding species and 0 if it is outside of the
range. This matrix is not required, but it can be helpful to restrict the
modeled area for each individual species to be within the realistic range
of locations for that species when estimating the model parameters. This is
applicable when auxiliary data sources are available on the realistic range
of the species.
|
inits |
a list with each tag corresponding to a parameter name.
Valid tags are alpha.comm , beta.comm , beta ,
alpha , tau.sq.beta , tau.sq.alpha ,
sigma.sq.psi , sigma.sq.p , z ,
phi , lambda , nu , sigma.sq.t , and rho .
nu is only specified if cov.model = "matern" ,
sigma.sq.t and rho are only specified if ar1 = TRUE ,
and sigma.sq.psi and sigma.sq.p are
only specified if random effects are included in occ.formula or
det.formula , respectively. The value portion of each tag is
the parameter's initial value. See priors description for definition
of each parameter name. Additionally, the tag fix can be set to TRUE
to fix the starting values across all chains. If fix is not specified
(the default), starting values are varied randomly across chains.
|
priors |
a list with each tag corresponding to a parameter name.
Valid tags are beta.comm.normal , alpha.comm.normal ,
tau.sq.beta.ig , tau.sq.alpha.ig , sigma.sq.psi ,
sigma.sq.p , phi.unif , nu.unif , sigma.sq.t.ig ,
and rho.unif . Community-level occurrence (beta.comm ) and detection
(alpha.comm ) regression coefficients are assumed to follow a
normal distribution. The hyperparameters of the normal distribution
are passed as a list of length two with the first and second elements
corresponding to the mean and variance of the normal distribution,
which are each specified as vectors of length equal to the number of
coefficients to be estimated or of length one if priors are the same for
all coefficients. If not specified, prior means are set
to 0 and prior variances set to 2.73. By default, community-level variance parameters
for occupancy (tau.sq.beta ) and detection (tau.sq.alpha ) are
assumed to follow an inverse Gamma distribution. The hyperparameters of
the inverse gamma distribution are passed as a list of length two with
the first and second elements corresponding to the shape and scale parameters,
which are each specified as vectors of length equal to the number of
coefficients to be estimated or a single value if priors are the same for all
parameters. If not specified, prior shape and scale
parameters are set to 0.1.
If desired, the species-specific occupancy coefficients
(beta ) and/or detection coefficients (alpha ) can also be estimated indepdendently by specifying the
tag independent.betas = TRUE and/or independent.alphas = TRUE , respectively.
If specified, this will not estimate species-specific
coefficients as random effects from a common-community-level distribution, and rather
the values of beta.comm /alpha.comm and tau.sq.beta /tau.sq.alpha will be fixed at the
specified initial values. This is equivalent to specifying a Gaussian, independent
prior for each of the species-specific effects.
The spatial factor model fits n.factors independent
spatial processes for each spatially-varying coefficient specified in svc.cols .
The spatial decay phi and smoothness nu parameters
for each latent factor are assumed to follow Uniform distributions.
The hyperparameters of the Uniform are passed as a list with two elements,
with both elements being vectors of length n.factors * length(svc.cols)
corresponding to the lower and
upper support, respectively, or as a single value if the same value is assigned
for all factor/SVC combinations. The priors for the factor loadings
matrix lambda for each SVC are fixed
following the standard spatial factor model to ensure parameter
identifiability (Christensen and Amemlya 2002). The
upper triangular elements of the N x n.factors matrix are fixed at 0 and the
diagonal elements are fixed at 1 for each SVC. The lower triangular elements are assigned a
standard normal prior (i.e., mean 0 and variance 1).
sigma.sq.psi and sigma.sq.p are the random
effect variances for any occurrence or
detection random effects, respectively, and are assumed to follow an inverse
Gamma distribution. The hyperparameters of the inverse-Gamma distribution
are passed as a list of length two with first and second elements corresponding
to the shape and scale parameters, respectively, which are each specified as
vectors of length equal to the number of random intercepts or of length one
if priors are the same for all random effect variances. sigma.sq.t and
rho are the AR(1) variance and correlation parameters for the AR(1) zero-mean
temporal random effects, respectively. sigma.sq.t is assumed to follow an inverse-Gamma
distribution, where the hyperparameters are specified as a list of length two with the
first and second elements corresponding to the shape and scale parameters, respectively,
which can each be specified as vector equal to the number of species in the model or a single value
if the same prior is used for all species. rho is assumed to follow a
uniform distribution, where the hyperparameters are specified similarly as a list of length two
with the first and second elements corresponding to the lower and upper bounds of the
uniform prior, which can each be specified as vector equal to the number of species in the
model or a single value if the same prior is used for all species.
|
tuning |
a list with each tag corresponding to a parameter
name. Valid tags are phi , nu , and rho . The value portion of each
tag defines the initial variance of the adaptive sampler. We assume the
initial variance of the adaptive sampler is the same for each species,
although the adaptive sampler will adjust the tuning variances separately
for each species. See Roberts and Rosenthal (2009) for details.
|
svc.cols |
a vector indicating the variables whose effects will be
estimated as spatially-varying coefficients. svc.cols can be an
integer vector with values indicating the order of covariates specified
in the model formula (with 1 being the intercept if specified), or it can
be specified as a character vector with names corresponding to variable
names in occ.covs (for the intercept, use '(Intercept)'). svc.cols
default argument of 1 results in a spatial occupancy model analogous to
sfMsPGOcc (assuming an intercept is included in the model).
|
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
"exponential" , "matern" , "spherical" , and
"gaussian" .
|
NNGP |
if TRUE , model is fit with an NNGP. If FALSE ,
a full Gaussian process is used. See Datta et al. (2016) and
Finley et al. (2019) for more information. Only
NNGP = TRUE is currently supported.
|
n.neighbors |
number of neighbors used in the NNGP. Only used if
NNGP = TRUE . Datta et al. (2016) showed that 15 neighbors is usually
sufficient, but that as few as 5 neighbors can be adequate for certain data
sets, which can lead to even greater decreases in run time. We recommend
starting with 15 neighbors (the default) and if additional gains in computation
time are desired, subsequently compare the results with a smaller number
of neighbors using WAIC.
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: "cb" and
"brute" . The "cb" should generally be much
faster. If locations do not have identical coordinate values on
the axis used for the nearest neighbor ordering then "cb"
and "brute" should produce identical neighbor sets.
However, if there are identical coordinate values on the axis used
for nearest neighbor ordering, then "cb" and "brute"
might produce different, but equally valid, neighbor sets,
e.g., if data are on a grid.
|
std.by.sp |
a logical value indicating whether the covariates are standardized
separately for each species within the corresponding range for each species (TRUE )
or not (FALSE ). Note that if range.ind is specified in data.list ,
this will result in the covariates being standardized differently for each species
based on the sites where range.ind == 1 for that given species. If range.ind is not specified
and std.by.sp = TRUE , this will simply be equivalent to standardizing
the covariates across all locations prior to fitting the model. Note that the covariates
in occ.formula should still be standardized across all locations. This can be done
either outside the function, or can be done by specifying scale() in the model formula
around the continuous covariates.
|
n.factors |
the number of factors to use in the spatial factor model approach.
Note this corresponds to the number of factors used for each spatially-varying
coefficient that is estimated in the model.
Typically, the number of factors is set to be small (e.g., 4-5) relative to the
total number of species in the community, which will lead to substantial
decreases in computation time. However, the value can be anywhere
between 1 and N (the number of species in the community).
|
svc.by.sp |
an optional list with length equal to length(svc.cols) . Each element
of the list should be a logical vector of length N (number of species) where each
element is TRUE, which indicates the SVC should be estimated for that species, or 0, which
indicates the SVC should be set to 0 and no SVC for that parameter will be estimated. Note
the first n.factors SVCs for all spatially-varying coefficients must be set to
TRUE . By default, SVCs are modeled for all species.
|
n.batch |
the number of MCMC batches in each chain to run for the Adaptive MCMC
sampler. See Roberts and Rosenthal (2009) for details.
|
batch.length |
the length of each MCMC batch to run for the Adaptive
MCMC sampler. See Roberts and Rosenthal (2009) for details.
|
accept.rate |
target acceptance rate for Adaptive MCMC. Defaul is
0.43. See Roberts and Rosenthal (2009) for details.
|
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we
recommend setting n.omp.threads up to the number of
hyperthreaded cores. Note, n.omp.threads > 1 might not
work on some systems.
|
verbose |
if TRUE , messages about data preparation,
model specification, and progress of the sampler are printed to the screen.
Otherwise, no messages are printed.
|
ar1 |
logical value indicating whether to include an AR(1) zero-mean
temporal random effect in the model. If FALSE , the model is
fit without an AR(1) temporal autocovariance structure. If TRUE ,
an AR(1) random effect is included in the model to account for temporal
autocorrelation across the primary time periods.
|
n.report |
the interval to report Metropolis sampler acceptance and
MCMC progress. Note this is specified in terms of batches and not overall
samples for spatial models.
|
n.burn |
the number of samples out of the total n.samples to
discard as burn-in for each chain. By default, the first 10% of samples is discarded.
|
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the n.burn samples are discarded. Default
value is set to 1.
|
n.chains |
the number of chains to run in sequence.
|
... |
currently no additional arguments
|
Value
An object of class svcTMsPGOcc
that is a list comprised of:
beta.comm.samples |
a coda object of posterior samples
for the community level occurrence regression coefficients.
|
alpha.comm.samples |
a coda object of posterior samples
for the community level detection regression coefficients.
|
tau.sq.beta.samples |
a coda object of posterior samples
for the occurrence community variance parameters.
|
tau.sq.alpha.samples |
a coda object of posterior samples
for the detection community variance parameters.
|
beta.samples |
a coda object of posterior samples
for the species level occurrence regression coefficients.
|
alpha.samples |
a coda object of posterior samples
for the species level detection regression coefficients.
|
theta.samples |
a coda object of posterior samples
for the species level correlation parameters for each spatially-varying coefficient and
the temporal autocorrelation parameters for each species when ar1 = TRUE .
|
lambda.samples |
a coda object of posterior samples
for the latent spatial factor loadings for each spatially-varying coefficient.
|
z.samples |
a four-dimensional array of posterior samples for
the latent occurrence values for each species. Dimensions corresopnd to
MCMC sample, species, site, and primary time period.
|
psi.samples |
a four-dimensional array of posterior samples for
the latent occupancy probability values for each species. Dimensions correspond
to MCMC sample, species, site, and primary time period.
|
w.samples |
a four-dimensional array of posterior samples for
the latent spatial random effects for each spatial factor within each
spatially-varying coefficient. Dimensions correspond to MCMC sample,
factor, site, and spatially-varying coefficient.
|
sigma.sq.psi.samples |
a coda object of posterior samples
for variances of random intercepts included in the occurrence portion
of the model. Only included if random intercepts are specified in
occ.formula .
|
sigma.sq.p.samples |
a coda object of posterior samples
for variances of random intercpets included in the detection portion
of the model. Only included if random intercepts are specified in
det.formula .
|
beta.star.samples |
a coda object of posterior samples
for the occurrence random effects. Only included if random intercepts
are specified in occ.formula .
|
alpha.star.samples |
a coda object of posterior samples
for the detection random effects. Only included if random intercepts
are specified in det.formula .
|
like.samples |
a four-dimensional array of posterior samples
for the likelihood value used for calculating WAIC. Dimensions correspond
to MCMC sample, species, site, and time period.
|
rhat |
a list of Gelman-Rubin diagnostic values for some of the model
parameters.
|
ESS |
a list of effective sample sizes for some of the model parameters.
|
run.time |
MCMC sampler execution time reported using proc.time() .
|
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation. Note that detection
probability estimated values are not included in the model object, but can
be extracted using fitted()
.
Note
Some of the underlying code used for generating random numbers from the
Polya-Gamma distribution is taken from the pgdraw package written
by Daniel F. Schmidt and Enes Makalic. Their code implements Algorithm
6 in PhD thesis of Jesse Bennett Windle (2013) https://repositories.lib.utexas.edu/handle/2152/21842.
Author(s)
Jeffrey W. Doser doserjef@msu.edu,
Andrew O. Finley finleya@msu.edu
References
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016)
Hierarchical Nearest-Neighbor Gaussian process models for large
geostatistical datasets. Journal of the American Statistical
Association, doi:10.1080/01621459.2015.1044091.
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and
S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor
Gaussian Processes. Journal of Computational and Graphical
Statistics, doi:10.1080/10618600.2018.1537924.
Finley, A. O., Datta, A., and Banerjee, S. (2020). spNNGP R package
for nearest neighbor Gaussian process models. arXiv preprint arXiv:2001.09111.
Polson, N.G., J.G. Scott, and J. Windle. (2013) Bayesian Inference for
Logistic Models Using Polya-Gamma Latent Variables.
Journal of the American Statistical Association, 108:1339-1349.
Roberts, G.O. and Rosenthal J.S. (2009) Examples of adaptive MCMC.
Journal of Computational and Graphical Statistics, 18(2):349-367.
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015).
Fitting Linear Mixed-Effects Models Using lme4. Journal of
Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Hooten, M. B., and Hobbs, N. T. (2015). A guide to Bayesian model
selection for ecologists. Ecological Monographs, 85(1), 3-28.
Christensen, W. F., and Amemiya, Y. (2002). Latent variable analysis
of multivariate spatial data. Journal of the American Statistical Association,
97(457), 302-317.
Examples
# Simulate Data -----------------------------------------------------------
set.seed(500)
J.x <- 8
J.y <- 8
J <- J.x * J.y
# Years sampled
n.time <- sample(3:10, J, replace = TRUE)
# n.time <- rep(10, J)
n.time.max <- max(n.time)
# Replicates
n.rep <- matrix(NA, J, max(n.time))
for (j in 1:J) {
n.rep[j, 1:n.time[j]] <- sample(2:4, n.time[j], replace = TRUE)
}
N <- 7
# Community-level covariate effects
# Occurrence
beta.mean <- c(-3, -0.2, 0.5)
trend <- FALSE
sp.only <- 0
p.occ <- length(beta.mean)
tau.sq.beta <- c(0.6, 1.5, 1.4)
# Detection
alpha.mean <- c(0, 1.2, -1.5)
tau.sq.alpha <- c(1, 0.5, 2.3)
p.det <- length(alpha.mean)
# Random effects
psi.RE <- list()
p.RE <- list()
# Draw species-level effects from community means.
beta <- matrix(NA, nrow = N, ncol = p.occ)
alpha <- matrix(NA, nrow = N, ncol = p.det)
for (i in 1:p.occ) {
beta[, i] <- rnorm(N, beta.mean[i], sqrt(tau.sq.beta[i]))
}
for (i in 1:p.det) {
alpha[, i] <- rnorm(N, alpha.mean[i], sqrt(tau.sq.alpha[i]))
}
sp <- TRUE
svc.cols <- c(1, 2)
p.svc <- length(svc.cols)
n.factors <- 3
phi <- runif(p.svc * n.factors, 3 / .9, 3 / .3)
factor.model <- TRUE
cov.model <- 'exponential'
dat <- simTMsOcc(J.x = J.x, J.y = J.y, n.time = n.time, n.rep = n.rep, N = N,
beta = beta, alpha = alpha, sp.only = sp.only, trend = trend,
psi.RE = psi.RE, p.RE = p.RE, factor.model = factor.model,
svc.cols = svc.cols, n.factors = n.factors, phi = phi, sp = sp,
cov.model = cov.model)
y <- dat$y
X <- dat$X
X.p <- dat$X.p
coords <- dat$coords
X.re <- dat$X.re
X.p.re <- dat$X.p.re
occ.covs <- list(occ.cov.1 = X[, , 2],
occ.cov.2 = X[, , 3])
det.covs <- list(det.cov.1 = X.p[, , , 2],
det.cov.2 = X.p[, , , 3])
data.list <- list(y = y, occ.covs = occ.covs,
det.covs = det.covs,
coords = coords)
# Priors
prior.list <- list(beta.comm.normal = list(mean = 0, var = 2.72),
alpha.comm.normal = list(mean = 0, var = 2.72),
tau.sq.beta.ig = list(a = 0.1, b = 0.1),
tau.sq.alpha.ig = list(a = 0.1, b = 0.1),
phi.unif = list(a = 3 / .9, b = 3 / .1))
z.init <- apply(y, c(1, 2, 3), function(a) as.numeric(sum(a, na.rm = TRUE) > 0))
inits.list <- list(alpha.comm = 0, beta.comm = 0, beta = 0,
alpha = 0, tau.sq.beta = 1, tau.sq.alpha = 1,
phi = 3 / .5, z = z.init)
# Tuning
tuning.list <- list(phi = 1)
# Number of batches
n.batch <- 5
# Batch length
batch.length <- 25
n.burn <- 25
n.thin <- 1
n.samples <- n.batch * batch.length
out <- svcTMsPGOcc(occ.formula = ~ occ.cov.1 + occ.cov.2,
det.formula = ~ det.cov.1 + det.cov.2,
data = data.list,
inits = inits.list,
n.batch = n.batch,
batch.length = batch.length,
accept.rate = 0.43,
NNGP = TRUE,
n.neighbors = 5,
n.factors = n.factors,
svc.cols = svc.cols,
cov.model = 'exponential',
priors = prior.list,
tuning = tuning.list,
n.omp.threads = 1,
verbose = TRUE,
n.report = 1,
n.burn = n.burn,
n.thin = n.thin,
n.chains = 1)
summary(out)
[Package
spOccupancy version 0.7.6
Index]