georob {georob} | R Documentation |
Robust Fitting of Spatial Linear Models
Description
The function georob
fits a linear model with spatially correlated
errors to geostatistical data that are possibly contaminated by
independent outliers. The regression coefficients and the parameters of
the variogram model are estimated by robust or Gaussian restricted
maximum likelihood (REML) or by Gaussian maximum likelihood (ML).
Usage
georob(formula, data, subset, weights, na.action, model = TRUE,
x = FALSE, y = FALSE, contrasts = NULL, offset, locations,
variogram.model = c("RMexp", "RMaskey", "RMbessel", "RMcauchy",
"RMcircular", "RMcubic", "RMdagum", "RMdampedcos", "RMdewijsian",
"RMfbm", "RMgauss", "RMgencauchy", "RMgenfbm", "RMgengneiting",
"RMgneiting", "RMlgd", "RMmatern", "RMpenta", "RMqexp",
"RMspheric", "RMstable", "RMwave", "RMwhittle"),
param, fit.param = default.fit.param()[names(param)],
aniso = default.aniso(), fit.aniso = default.fit.aniso(),
variogram.object = NULL,
tuning.psi = 2, control = control.georob(),
verbose = 0, ...)
Arguments
formula |
a symbolic description of the regression model for the
external drift to be fit (mandatory argument). See
|
data |
an optional data frame, a
|
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
weights |
an optional vector of weights to be used in the fitting process, currently ignored. |
na.action |
a function which indicates what should happen when the
data contain |
model , x , y |
logical scalars. If |
contrasts |
an optional list. See the |
offset |
this optional argument can be used to specify an a
priori known component to be included in the linear predictor during
fitting. An |
locations |
a one-sided formula defining the variables that are used as coordinates of the locations were the data was recorded (mandatory argument). |
variogram.model |
a character keyword defining the variogram model
to be fitted. Currently, most basic variogram models provided formerly
by the now archived package RandomFields can be fitted (see
Details and |
param |
a named numeric vector with initial values of the
variogram parameters (mandatory argument). The names of
|
fit.param |
a named logical vector (or a function such as
|
aniso |
a named numeric vector with initial values (or a function such as
|
fit.aniso |
a named logical vector (or a function such as
|
variogram.object |
an optional list that defines a possibly nested variogram model. Each component is itself a list with the following components:
Note that the arguments |
tuning.psi |
positive numeric. The tuning constant |
control |
a list specifying parameters that control the behaviour of
|
verbose |
positive integer controlling logging of diagnostic
messages to the console during model fitting. |
... |
further arguments passed to function (e.g. |
Details
georob
fits a spatial linear model by robust (Künsch et al., 2011, Künsch
et al., in preparation) or Gaussian RE(ML) (Harville, 1977).
georobPackage
describes the employed model and briefly
sketches the robust REML estimation and the robust external drift Kriging
method. Here, we describe further details of georob
.
Implemented variograms
Currently, most basic variogram models provided formerly by the now
archived package RandomFields can be fitted by georob
(see
argument variogram.model
and gencorr
for a list of
implemented models). Some of these models have in addition to
variance
, snugget
, nugget
and scale
further
parameters. Initial values of these parameters (param
) and
fitting flags (fit.param
) must be passed to georob
by the
same names as used for the models RM...
in
gencorr
. Use the function param.names
to
list additional parameters of a given variogram.model.
The arguments fit.param
and fit.aniso
are used to control
what variogram and anisotropy parameters are estimated and which are
kept at the constant initial values. The functions
default.fit.param
and default.fit.aniso
set
reasonable default values for these arguments. Note, as an aside, that
the function default.aniso
sets (default) values of the
anisotropy parameters for an isotropic variogram.
Estimating parameters of power function variogram
The intrinsic variogram model RMfbm
is over-parametrized when
both the variance
(plus possibly snugget
) and the
scale
are estimated. Therefore, to estimate the parameters of
this model, scale
must be kept fixed at an arbitrary value by
using fit.param["scale"] = FALSE
.
Estimating parameters of geometrically anisotropic variograms
The subsection Model of georobPackage
describes
how such models are parametrized and gives definitions the various
elements of aniso
. Some additional remarks might be helpful:
The first semi-principal axis points into the direction with the farthest reaching auto-correlation, which is described by the range parameter
scale
().
The ranges in the direction of the second and third semi-principal axes are given by
and
, with
.
The default values for
aniso
(,
) define an isotropic variogram model.
Valid ranges for the angles characterizing the orientation of the semi-variance ellipsoid are (in degrees):
[0, 180],
[0, 180],
[-90, 90].
Estimating variance of micro-scale variation
Simultaneous estimation of the variance of the micro-scale variation
(snugget
, ), appears seemingly
as spatially uncorrelated with a given sampling design, and of the
variance (
nugget
, ) of the independent errors
requires that for some locations
replicated observations are
available. Locations less or equal than
zero.dist
apart are
thereby considered as being coincident (see
control.georob
).
Constraining estimates of variogram parameters
Parameters of variogram models can vary only within certain bounds (see
param.bounds
and gencorr
for allowed ranges). georob
uses three mechanisms to constrain
parameter estimates to permissible ranges:
-
Parameter transformations: By default, all variance (
variance
,snugget
,nugget
), the rangescale
, the anisotropy parametersf1
andf2
and many of the additional parameters are log-transformed before solving the estimating equations or maximizing the restricted log-likelihood and this warrants that the estimates are always positive (seecontrol.georob
for detailed explanations how to control parameter transformations). -
Checking permissible ranges: The additional parameters of the variogram models such as the smoothness parameter
of the Whittle-Matérn model are forced to stay in the permissible ranges by signalling an error to
nleqslv
,nlminb
oroptim
if the current trial values are invalid. These functions then graciously update the trial values of the parameters and carry their task on. However, it is clear that such a procedure likely gets stuck at a point on the boundary of the parameter space and is therefore just a workaround for avoiding runtime errors due to invalid parameter values. -
Exploiting the functionality of
nlminb
andoptim
: If a spatial model is fitted non-robustly, then the argumentslower
,upper
(andmethod
ofoptim
) can be used to constrain the parameters (seecontrol.optim
how to pass them tooptim
). Foroptim
one has to use the argumentsmethod = "L-BFGS-B"
,lower = l
,upper = u
, where l and u are numeric vectors with the lower and upper bounds of the transformed parameters in the order as they appear in
c(variance, snugget, nugget, scale, ...)[fit.param], aniso[fit.aniso])
,
where...
are additional parameters of isotropic variogram models (use
param.names(variogram.model)
to display the names and the order of the additional parameters forvariogram.model
). Fornlminb
one has to use the argumentslower = l
,upper = u
, where l and u are numeric vectors as described above.
Computing robust initial estimates of parameters for robust REML
To solve the robustified estimating equations for
and
the following initial
estimates are used:
-
if this turns out to be infeasible, initial values can be passed to
georob
by the argumentbhat
ofcontrol.georob
. -
is either estimated robustly by the function
lmrob
,rq
or non-robustly bylm
(see argumentinitial.fixef
ofcontrol.georob
).
Finding the roots of the robustified estimating equations of the
variogram and anisotropy parameters is more sensitive to a good choice
of initial values than maximizing the Gaussian (restricted)
log-likelihood with respect to the same parameters. If the initial
values for param
and aniso
are not sufficiently close to
the roots of the system of nonlinear equations, then
nleqslv
may fail to find them.
Setting initial.param = TRUE
(see control.georob
)
allows one to find initial values that are
often sufficiently close to the roots so that
nleqslv
converges. This is achieved by:
Initial values of the regression parameters are computed by
lmrob
irrespective of the choice forinitial.fixef
(seecontrol.georob
).Observations with “robustness weights” of the
lmrob
fit, satisfying
, are discarded (see
control.georob
).The model is fit to the pruned data set by Gaussian REML using
nlminb
oroptim
.The resulting estimates of the variogram parameters (
param
,aniso
) are used as initial estimates for the subsequent robust fit of the model bynleqslv
.
Note that for step 3 above, initial values of param
and
aniso
must be provided to georob
.
Estimating variance parameters by Gaussian (RE)ML
Unlike robust REML, where robustified estimating equations are solved
for the variance parameters nugget
(),
variance
(), and possibly
snugget
(), for Gaussian (RE)ML the
variances can be re-parametrized to
the signal variance
,
the inverse relative nugget
and
the relative auto-correlated signal variance
.
georob
maximizes then a (restricted) profile
log-likelihood that depends only on ,
,
, ..., and
is estimated by an explicit
expression that depends on these parameters (e.g. Diggle and
Ribeiro, 2006, p. 113). This is usually more efficient than
maximizing the (restricted) log-likelihood with respect to the original
variance parameters
,
and
.
georob
chooses the parametrization automatically, but the user
can control it by the argument reparam
of the function
control.georob
.
Value
An object of class georob
representing a robust (or Gaussian) (RE)ML
fit of a spatial linear model. See
georobObject
for the components of the fit.
Author(s)
Andreas Papritz papritz@retired.ethz.ch
with contributions by Cornelia Schwierz.
References
Diggle, P. J. and Ribeiro, P. J. R. (2006) Model-based Geostatistics. Springer, New York, doi:10.1007/978-0-387-48536-2.
Harville, D. A. (1977) Maximum likelihood approaches to variance component estimation and to related problems, Journal of the American Statistical Association, 72, 320–340, doi:10.1080/01621459.1977.10480998.
Künsch, H. R., Papritz, A., Schwierz, C. and Stahel, W. A. (in preparation) Robust Geostatistics.
Künsch, H. R., Papritz, A., Schwierz, C. and Stahel, W. A. (2011) Robust estimation of the external drift and the variogram of spatial data. Proceedings of the ISI 58th World Statistics Congress of the International Statistical Institute. doi:10.3929/ethz-a-009900710
See Also
georobPackage
for a description of the model and a brief summary of the algorithms;
georobObject
for a description of the class georob
;
profilelogLik
for computing profiles of Gaussian likelihoods;
plot.georob
for display of RE(ML) variogram estimates;
control.georob
for controlling the behaviour of georob
;
georobModelBuilding
for stepwise building models of class georob
;
cv.georob
for assessing the goodness of a fit by georob
;
georobMethods
for further methods for the class georob
;
predict.georob
for computing robust Kriging predictions;
lgnpp
for unbiased back-transformation of Kriging prediction
of log-transformed data;
georobSimulation
for simulating realizations of a Gaussian process
from model fitted by georob
; and finally
sample.variogram
and fit.variogram.model
for robust estimation and modelling of sample variograms.
Examples
################
## meuse data ##
################
data(meuse)
## Gaussian REML fit
r.logzn.reml <- georob(log(zinc) ~ sqrt(dist), data = meuse, locations = ~ x + y,
variogram.model = "RMexp",
param = c(variance = 0.15, nugget = 0.05, scale = 200),
tuning.psi = 1000)
summary(r.logzn.reml, correlation = TRUE)
plot(r.logzn.reml, lag.dist.def = seq(0, 2000, by = 100))
## robust REML fit
if(interactive()){
## example is run only in interactive session because cpu times exceeds 5 s
r.logzn.rob <- update(r.logzn.reml, tuning.psi = 1)
summary(r.logzn.rob, correlation = TRUE)
lines(r.logzn.rob, col = "red")
}
###################
## wolfcamp data ##
###################
data(wolfcamp)
## fitting isotropic IRF(0) model
r.irf0.iso <- georob(pressure ~ 1, data = wolfcamp, locations = ~ x + y,
variogram.model = "RMfbm",
param = c(variance = 10, nugget = 1500, scale = 1, alpha = 1.5),
fit.param = default.fit.param(scale = FALSE, alpha = TRUE),
tuning.psi = 1000)
summary(r.irf0.iso)
plot(r.irf0.iso, lag.dist.def = seq(0, 200, by = 7.5))
## fitting anisotropic IRF(0) model
if(interactive()){
## example is run only in interactive session because cpu times exceeds 5 s
r.irf0.aniso <- georob(pressure ~ 1, data = wolfcamp, locations = ~ x + y,
variogram.model = "RMfbm",
param = c(variance = 5.9, nugget = 1450, scale = 1, alpha = 1),
fit.param = default.fit.param(scale = FALSE, alpha = TRUE),
aniso = default.aniso(f1 = 0.51, omega = 148.),
fit.aniso = default.fit.aniso(f1 = TRUE, omega = TRUE),
tuning.psi = 1000)
summary(r.irf0.aniso)
plot(r.irf0.aniso, lag.dist.def = seq(0, 200, by = 7.5),
xy.angle.def = c(0, 22.5, 67.5, 112.5, 157.5, 180.),
add = TRUE, col = 2:5)
pchisq(2*(r.irf0.aniso[["loglik"]] - r.irf0.iso[["loglik"]]), 2, lower = FALSE)
}