| spglm {spmodel} | R Documentation |
Fit spatial generalized linear models
Description
Fit spatial generalized linear models for point-referenced data (i.e., generalized geostatistical models) using a variety of estimation methods, allowing for random effects, anisotropy, partition factors, and big data methods.
Usage
spglm(
formula,
family,
data,
spcov_type,
xcoord,
ycoord,
spcov_initial,
dispersion_initial,
estmethod = "reml",
anisotropy = FALSE,
random,
randcov_initial,
partition_factor,
local,
...
)
Arguments
formula |
A two-sided linear formula describing the fixed effect structure
of the model, with the response to the left of the |
family |
The generalized linear model family describing the distribution
of the response variable to be used. Available options
|
data |
A data frame or |
spcov_type |
The spatial covariance type. Available options include
|
xcoord |
The name of the column in |
ycoord |
The name of the column in |
spcov_initial |
An object from |
dispersion_initial |
An object from |
estmethod |
The estimation method. Available options include
|
anisotropy |
A logical indicating whether (geometric) anisotropy should
be modeled. Not required if |
random |
A one-sided linear formula describing the random effect structure
of the model. Terms are specified to the right of the |
randcov_initial |
An optional object specifying initial and/or known values for the random effect variances. |
partition_factor |
A one-sided linear formula with a single term specifying the partition factor. The partition factor assumes observations from different levels of the partition factor are uncorrelated. |
local |
An optional logical or list controlling the big data approximation.
If omitted,
When |
... |
Other arguments to |
Details
The spatial generalized linear model for point-referenced data
(i.e., generalized geostatistical model) can be written as
g(\mu) = \eta = X \beta + \tau + \epsilon, where \mu is the expectation
of the response (y) given the random errors, g(.) is called
a link function which links together the \mu and \eta,
X is the fixed effects design
matrix, \beta are the fixed effects, \tau is random error that is
spatially dependent, and \epsilon is random error that is spatially
independent.
There are six generalized linear model
families available: poisson assumes y is a Poisson random variable
nbinomial assumes y is a negative binomial random
variable, binomial assumes y is a binomial random variable,
beta assumes y is a beta random variable,
Gamma assumes y is a gamma random
variable, and inverse.gaussian assumes y is an inverse Gaussian
random variable.
The supports for y for each family are given below:
family: support of
ypoisson:
0 \le y;yan integernbinomial:
0 \le y;yan integerbinomial:
0 \le y;yan integerbeta:
0 < y < 1Gamma:
0 < yinverse.gaussian:
0 < y
The generalized linear model families and the parameterizations of their link functions are given below:
family: link function
poisson:
g(\mu) = log(\eta)(log link)nbinomial:
g(\mu) = log(\eta)(log link)binomial:
g(\mu) = log(\eta / (1 - \eta))(logit link)beta:
g(\mu) = log(\eta / (1 - \eta))(logit link)Gamma:
g(\mu) = log(\eta)(log link)inverse.gaussian:
g(\mu) = log(\eta)(log link)
The variance function of an individual y (given \mu)
for each generalized linear model family is given below:
family:
Var(y)poisson:
\mu \phinbinomial:
\mu + \mu^2 / \phibinomial:
n \mu (1 - \mu) \phibeta:
\mu (1 - \mu) / (1 + \phi)Gamma:
\mu^2 / \phiinverse.gaussian:
\mu^2 / \phi
The parameter \phi is a dispersion parameter that influences Var(y).
For the poisson and binomial families, \phi is always
one. Note that this inverse Gaussian parameterization is different than a
standard inverse Gaussian parameterization, which has variance \mu^3 / \lambda.
Setting \phi = \lambda / \mu yields our parameterization, which is
preferred for computational stability. Also note that the dispersion parameter
is often defined in the literature as V(\mu) \phi, where V(\mu) is the variance
function of the mean. We do not use this parameterization, which is important
to recognize while interpreting dispersion estimates.
For more on generalized linear model constructions, see McCullagh and
Nelder (1989).
Together, \tau and \epsilon are modeled using
a spatial covariance function, expressed as
de * R + ie * I, where de is the dependent error variance, R
is a correlation matrix that controls the spatial dependence structure among observations,
ie is the independent error variance, and I is
an identity matrix. Recall that \tau and \epsilon are modeled on the link scale,
not the inverse link (response) scale. Random effects are also modeled on the link scale.
spcov_type Details: Parametric forms for R are given below, where \eta = h / range
for h distance between observations:
exponential:
exp(- \eta )spherical:
(1 - 1.5\eta + 0.5\eta^3) * I(h <= range)gaussian:
exp(- \eta^2 )triangular:
(1 - \eta) * I(h <= range)circular:
(1 - (2 / \pi) * (m * sqrt(1 - m^2) + sin^{-1}(m))) * I(h <= range), m = min(\eta, 1)cubic:
(1 - 7\eta^2 + 8.75\eta^3 - 3.5\eta^5 + 0.75\eta^7) * I(h <= range)pentaspherical:
(1 - 1.875\eta + 1.25\eta^3 - 0.375\eta^5) * I(h <= range)cosine:
cos(\eta)wave:
sin(\eta) / \eta * I(h > 0) + I(h = 0)jbessel:
Bj(h * range), Bj is Bessel-J functiongravity:
(1 + \eta^2)^{-0.5}rquad:
(1 + \eta^2)^{-1}magnetic:
(1 + \eta^2)^{-1.5}matern:
2^{1 - extra}/ \Gamma(extra) * \alpha^{extra} * Bk(\alpha, extra),\alpha = (2extra * \eta)^{0.5}, Bk is Bessel-K function with order1/5 \le extra \le 5cauchy:
(1 + \eta^2)^{-extra},extra > 0pexponential:
exp(h^{extra}/range),0 < extra \le 2none:
0
All spatial covariance functions are valid in one spatial dimension. All
spatial covariance functions except triangular and cosine are
valid in two dimensions.
estmethod Details: The various estimation methods are
-
reml: Maximize the restricted log-likelihood. -
ml: Maximize the log-likelihood.
Note that the likelihood being optimized is obtained using the Laplace approximation.
anisotropy Details: By default, all spatial covariance parameters except rotate
and scale as well as all random effect variance parameters
are assumed unknown, requiring estimation. If either rotate or scale
are given initial values other than 0 and 1 (respectively) or are assumed unknown
in spcov_initial(), anisotropy is implicitly set to TRUE.
(Geometric) Anisotropy is modeled by transforming a covariance function that
decays differently in different directions to one that decays equally in all
directions via rotation and scaling of the original coordinates. The rotation is
controlled by the rotate parameter in [0, \pi] radians. The scaling
is controlled by the scale parameter in [0, 1]. The anisotropy
correction involves first a rotation of the coordinates clockwise by rotate and then a
scaling of the coordinates' minor axis by the reciprocal of scale. The spatial
covariance is then computed using these transformed coordinates.
random Details: If random effects are used, the model
can be written as y = X \beta + Z1u1 + ... Zjuj + \tau + \epsilon,
where each Z is a random effects design matrix and each u is a random effect.
partition_factor Details: The partition factor can be represented in matrix form as P, where
elements of P equal one for observations in the same level of the partition
factor and zero otherwise. The covariance matrix involving only the
spatial and random effects components is then multiplied element-wise
(Hadmard product) by P, yielding the final covariance matrix.
local Details: The big data approximation works by sorting observations into different levels
of an index variable. Observations in different levels of the index variable
are assumed to be uncorrelated for the purposes of model fitting. Sparse matrix methods are then implemented
for significant computational gains. Parallelization generally further speeds up
computations when data sizes are larger than a few thousand. Both the "random" and "kmeans" values of method
in local have random components. That means you may get slightly different
results when using the big data approximation and rerunning spglm() with the same code. For consistent results,
either set a seed via base::set.seed() or specify index to local.
Observations with NA response values are removed for model
fitting, but their values can be predicted afterwards by running
predict(object).
Value
A list with many elements that store information about
the fitted model object. If spcov_type or spcov_initial are
length one, the list has class spglm. Many generic functions that
summarize model fit are available for spglm objects, including
AIC, AICc, anova, augment, AUROC, BIC, coef,
cooks.distance, covmatrix, deviance, fitted, formula,
glance, glances, hatvalues, influence,
labels, logLik, loocv, model.frame, model.matrix,
plot, predict, print, pseudoR2, summary,
terms, tidy, update, varcomp, and vcov. If
spcov_type or spcov_initial are length greater than one, the
list has class spglm_list and each element in the list has class
spglm. glances can be used to summarize spglm_list
objects, and the aforementioned spglm generics can be used on each
individual list element (model fit).
Note
This function does not perform any internal scaling. If optimization is not stable due to large extremely large variances, scale relevant variables so they have variance 1 before optimization.
References
McCullagh P. and Nelder, J. A. (1989) Generalized Linear Models. London: Chapman and Hall.
Examples
spgmod <- spglm(presence ~ elev,
family = "binomial", data = moose,
spcov_type = "exponential"
)
summary(spgmod)