spAbund {spAbundance} | R Documentation |
Function for Fitting Univariate Spatial Abundance GLMs
Description
The function spAbund
fits univariate spatial abundance GLMs.
Usage
spAbund(formula, data, inits, priors, tuning,
cov.model = 'exponential', NNGP = TRUE,
n.neighbors = 15, search.type = 'cb',
n.batch, batch.length, accept.rate = 0.43, family = 'Poisson',
n.omp.threads = 1, verbose = TRUE, n.report = 100,
n.burn = round(.10 * n.batch * batch.length), n.thin = 1,
n.chains = 1, save.fitted = TRUE, ...)
Arguments
formula |
a symbolic description of the model to be fit
for the model using R's model syntax. Only
right-hand side of formula is specified. See example below. Random intercepts
and slopes are allowed using lme4 syntax (Bates et al. 2015).
|
data |
a list containing data necessary for model fitting.
Valid tags are y , covs , z , coords , and offset . y
is a vector, matrix, or data frame of the observed count values. If a vector,
the values represent the observed counts at each site. If multiple replicate
observations are obtained at the sites (e.g., sub-samples, repeated sampling over
multiple seasons), y can be specified as a matrix or data frame
with first dimension equal to the number of
sites (J ) and second dimension equal to the maximum number of
replicates at a given site. covs is either a data frame or list
containing the variables used in the model. When only fitting a model with site-level
data, covs can be specified as a data frame, with each row corresponding to
site and each column corresponding to a variable. When multiple abundance values
are available at a site, covs is specified as a list, where each list element is a different
covariate, which can be site-level or observation-level. Site-level covariates
are specified as a vector of length J , while observation-level covariates
are specified as a matrix or data frame with the number of rows equal to J
and number of columns equal to the maximum number of replicate observations at a
given site. coords is a J \times 2 matrix of the observation coordinates.
Note that spAbundance assumes coordinates are specified
in a projected coordinate system. For zero-inflated Gaussian models, the tag z is
used to specify the binary component of the zero-inflated model and should have the same
length as y . offset is an offset to use in the abundance model (e.g., an area offset).
This can be either a single value, a vector with an offset for each site (e.g., if survey area differed in size), or a site x replicate matrix if more than one count is available at a given site.
|
inits |
a list with each tag corresponding to a parameter name.
Valid tags are beta , sigma.sq ,
phi , w , nu , kappa , sigma.sq.mu , tau.sq .
nu is only specified if cov.model = "matern" , sigma.sq.mu
is only specified if there are random effects in formula , and
kappa is only specified when family = 'NB' .
tau.sq is only specified when family = 'Gaussian' or family = 'zi-Gaussian' .
The value portion of each tag is the parameter's initial value. See priors
description for definition of each parameter name.
Additionally, the tag fix can be set to TRUE
to fix the starting values across all chains. If fix is not specified
(the default), starting values are varied randomly across chains.
|
priors |
a list with each tag corresponding to a parameter name.
Valid tags are beta.normal , phi.unif ,
sigma.sq.ig , nu.unif , kappa.unif ,
sigma.sq.mu.ig , tau.sq.ig . Abundance (beta ) regression coefficients
are assumed to follow a normal distribution. The hyperparameters of the
normal distribution are passed as a list of length two with the first
and second elements corresponding to the mean and variance of the normal
distribution, which are each specified as vectors of
length equal to the number of coefficients to be estimated or of length
one if priors are the same for all coefficients. If not
specified, prior means are set to 0 and prior variances are set to 100. The
spatial variance parameter, sigma.sq , is assumed to follow an
inverse-Gamma distribution. The spatial decay phi , spatial
smoothness nu , and negative binomial dispersion kappa
parameters are assumed to follow Uniform
distributions. The hyperparameters of the inverse-Gamma for sigma.sq
are passed as a vector of length two, with the first and second
elements corresponding to the shape and scale, respectively.
The hyperparameters of the Uniform are also passed as a vector of
length two with the first and second elements corresponding to
the lower and upper support, respectively. sigma.sq.mu
are the random effect variances for any random effects, and are assumed to follow an
inverse-Gamma distribution. The hyperparameters of
the inverse-Gamma distribution are passed as a list of length two with the
first and second elements corresponding to the shape and scale parameters,
respectively, which are each specified as vectors of length equal to the
number of random effects or of length one if priors are the same for all
random effect variances. tau.sq is the residual variance
for Gaussian (or zero-inflated Gaussian) models, and it is assigned
an inverse-Gamma prior. The hyperparameters of the inverse-Gamma are passed as a vector
of length two, with the first and second element corresponding to the shape and
scale parameters, respectively.
|
cov.model |
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
"exponential" , "matern" , "spherical" , and
"gaussian" .
|
tuning |
a single numeric value representing the initial variance of the
adaptive sampler for beta , alpha , beta.star (the abundance
random effect values), kappa , phi ,
and nu . See Roberts and Rosenthal (2009) for details. Note that only phi
and nu are the only parameters that require tuning for a Gaussian or
zero-inflated Gaussian model.
|
NNGP |
if TRUE , model is fit with an NNGP. See Datta et al. (2016) and
Finley et al. (2019) for more information. Currently only NNGP is supported,
functionality for a full GP may be addded in future package development.
|
n.neighbors |
number of neighbors used in the NNGP. Only used if
NNGP = TRUE . Datta et al. (2016) showed that 15 neighbors is usually
sufficient, but that as few as 5 neighbors can be adequate for certain data
sets, which can lead to even greater decreases in run time. We recommend
starting with 15 neighbors (the default) and if additional gains in computation
time are desired, subsequently compare the results with a smaller number
of neighbors using WAIC.
|
search.type |
a quoted keyword that specifies the type of nearest
neighbor search algorithm. Supported method key words are: "cb" and
"brute" . The "cb" should generally be much
faster. If locations do not have identical coordinate values on
the axis used for the nearest neighbor ordering then "cb"
and "brute" should produce identical neighbor sets.
However, if there are identical coordinate values on the axis used
for nearest neighbor ordering, then "cb" and "brute"
might produce different, but equally valid, neighbor sets,
e.g., if data are on a grid.
|
n.batch |
the number of MCMC batches in each chain to run for the adaptive MCMC
sampler. See Roberts and Rosenthal (2009) for details.
|
batch.length |
the length of each MCMC batch in each chain to run for the adaptive
MCMC sampler. See Roberts and Rosenthal (2009) for details.
|
accept.rate |
target acceptance rate for adaptive MCMC. Default is
0.43. See Roberts and Rosenthal (2009) for details.
|
family |
the distribution to use for the latent abundance process. Currently
supports 'NB' (negative binomial), 'Poisson' , 'Gaussian' ,
and 'zi-Gaussian' .
|
n.omp.threads |
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we
recommend setting n.omp.threads up to the number of
hyperthreaded cores. Note, n.omp.threads > 1 might not
work on some systems.
|
verbose |
if TRUE , messages about data preparation,
model specification, and progress of the sampler are printed to the screen.
Otherwise, no messages are printed.
|
n.report |
the interval to report Metropolis sampler acceptance and
MCMC progress.
|
n.burn |
the number of samples out of the total n.batch * batch.length
samples in each chain to discard as burn-in. By default, the first
10% of samples is discarded.
|
n.thin |
the thinning interval for collection of MCMC samples. The
thinning occurs after the n.burn samples are discarded. Default
value is set to 1.
|
n.chains |
the number of MCMC chains to run in sequence.
|
save.fitted |
logical value indicating whether or not fitted values and likelihood values
should be saved in the resulting model object. If save.fitted = FALSE , the components
y.rep.samples , mu.samples , and like.samples will not be included
in the model object, and subsequent functions for calculating WAIC, fitted values, and
posterior predictive checks will not work, although they all can be calculated manually if
desired. Setting save.fitted = FALSE can be useful when working with very large
data sets to minimize the amount of RAM needed when fitting and storing the model object in
memory.
|
... |
currently no additional arguments
|
Value
An object of class spAbund
that is a list comprised of:
beta.samples |
a coda object of posterior samples
for the abundance regression coefficients.
|
kappa.samples |
a coda object of posterior samples
for the abundance dispersion parameter. Only included when
family = 'NB' .
|
tau.sq.samples |
a coda object of posterior samples
for the Gaussian residual variance parameter. Only included when
family = 'Gaussian' or family = 'zi-Gaussian' .
|
y.rep.samples |
a two or three-dimensional object of posterior samples
for the abundance replicate (fitted) values with dimensions
corresponding to MCMC samples, site, and replicate.
|
mu.samples |
a two or -three-dimensional array of posterior samples
for the expected abundance samples with dimensions corresponding
to MCMC samples, site, and replicate.
|
theta.samples |
a coda object of posterior samples
for spatial covariance parameters.
|
w.samples |
a coda object of posterior samples
for latent spatial random effects.
|
sigma.sq.mu.samples |
a coda object of posterior samples
for variances of random effects included in the model.
Only included if random effects are specified in
formula .
|
beta.star.samples |
a coda object of posterior samples
for the abundance random effects. Only included if random effects
are specified in formula .
|
like.samples |
a coda object of posterior samples
for the likelihood value associated with each site. Used for calculating
WAIC.
|
rhat |
a list of Gelman-Rubin diagnostic values for some of the model
parameters.
|
ESS |
a list of effective sample sizes for some of the model parameters.
|
run.time |
execution time reported using proc.time() .
|
The return object will include additional objects used for
subsequent prediction and/or model fit evaluation.
Author(s)
Jeffrey W. Doser doserjef@msu.edu,
Andrew O. Finley finleya@msu.edu
References
Bates, Douglas, Martin Maechler, Ben Bolker, Steve Walker (2015).
Fitting Linear Mixed-Effects Models Using lme4. Journal of
Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016)
Hierarchical Nearest-Neighbor Gaussian process models for large
geostatistical datasets. Journal of the American Statistical
Association, doi:10.1080/01621459.2015.1044091.
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and
S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor
Gaussian Processes. Journal of Computational and Graphical
Statistics, doi:10.1080/10618600.2018.1537924.
Roberts, G.O. and Rosenthal J.S. (2009) Examples of adaptive MCMC.
Journal of Computational and Graphical Statistics, 18(2):349-367.
Examples
set.seed(888)
J.x <- 8
J.y <- 8
J <- J.x * J.y
n.rep <- sample(3, J, replace = TRUE)
beta <- c(0, -1.5, 0.3, -0.8)
p.abund <- length(beta)
mu.RE <- list(levels = c(50, 45),
sigma.sq.mu = c(1.3, 0.5),
beta.indx = c(1, 2))
phi <- 3/.6
sigma.sq <- 2
kappa <- 0.2
sp <- TRUE
cov.model <- 'exponential'
family <- 'NB'
dat <- simAbund(J.x = J.x, J.y = J.y, n.rep = n.rep, beta = beta,
kappa = kappa, mu.RE = mu.RE, sp = sp, phi = phi,
sigma.sq = sigma.sq, cov.model = cov.model, family = 'NB')
y <- dat$y
X <- dat$X
X.re <- dat$X.re
coords <- dat$coords
covs <- list(int = X[, , 1],
abund.cov.1 = X[, , 2],
abund.cov.2 = X[, , 3],
abund.cov.3 = X[, , 4],
abund.factor.1 = X.re[, , 1],
abund.factor.2 = X.re[, , 2])
data.list <- list(y = y, covs = covs, coords = coords)
# Priors
prior.list <- list(beta.normal = list(mean = 0, var = 100),
phi.unif = c(3 / 1, 3 / .1),
sigma.sq.ig = c(2, 1),
kappa.unif = c(0.001, 10))
# Starting values
inits.list <- list(beta = beta, kappa = kappa, sigma.sq = sigma.sq, phi = phi)
tuning <- list(phi = 0.3, kappa = 0.05, beta = 0.1, beta.star = 0.1, w = 0.1)
n.batch <- 4
batch.length <- 25
n.burn <- 20
n.thin <- 1
n.chains <- 1
out <- spAbund(formula = ~ abund.cov.1 + abund.cov.2 + abund.cov.3 +
(1 | abund.factor.1) + (abund.cov.1 | abund.factor.2),
data = data.list,
n.batch = n.batch,
batch.length = batch.length,
inits = inits.list,
tuning = tuning,
priors = prior.list,
NNGP = TRUE,
cov.model = 'exponential',
search.type = 'cb',
n.neighbors = 5,
accept.rate = 0.43,
n.omp.threads = 1,
verbose = TRUE,
n.report = 1,
n.burn = n.burn,
n.thin = n.thin,
n.chains = n.chains)
summary(out)
[Package
spAbundance version 0.1.3
Index]