metropolis.krige {krige}R Documentation

Sampling Technique Using Metropolis-Hastings

Description

This function performs Metropolis-Hastings sampling for a linear model specified over point-referenced geospatial data. It returns MCMC iterations, with which results of the geospatial linear model can be summarized.

Usage

metropolis.krige(
  formula,
  coords,
  data,
  n.iter = 100,
  powered.exp = 2,
  n.burnin = 0,
  y,
  X,
  east,
  north,
  na.action = "na.fail",
  spatial.share = 0.5,
  range.share = 0.5,
  beta.var = 10,
  range.tol = 0.05,
  b.tune = 1,
  nugget.tune = 10,
  psill.tune = 1,
  distance.matrix = FALSE,
  progress.bar = "message",
  accept.rate.warning = TRUE
)

Arguments

formula

An object of class formula (or one that can be coerced to that classes). A symbolic description of the model to be fitted. Alternatively, the model can be specified in y (a vector of the outcome variable) and X (a matrix of explanatory variables).

coords

A matrix of coordinates for all observations or a vector of variable names indicating the coordinates variables in the data. Alternatively, the coordinates can also be specified seperately using east and north.

data

An data frame containing the variables in the model.

n.iter

Number of MCMC iterations (defaults to 100).

powered.exp

This exponent, which must be greater than 0 and less than or equal to 2, specifies a powered exponential correlation structure for the data. One widely used specification is setting this to 1, which yields an exponential correlation structure. Another common specification is setting this to 2 (the default), which yields a Gaussian correlation structure.

n.burnin

Number of iterations that will be discarded for burnin (warmup). The number of burnin should not be larger than n.iter and the default is 0.

y

Alternative specification for the outcome variable that is used in the kriging model. If formula is used, this argument will be suppressed.

X

Alternative specification for the matrix of explanatory variables used in the kriging model. Different forms of the variables such as transformations and interactions also need to be specified accordingly beforehand.

east

Alternative specification for the vector of eastings for all observations.

north

Alternative specification for the vector of northing for all observations.

na.action

A function which indicates what should happen when the data contain NAs. The default is "na.fail." Another possible value is "na.omit."

spatial.share

Prior for proportion of unexplained variance that is spatial in nature. Must be greater than 0 and less than 1. Defaults to an even split, valued at 0.5.

range.share

Prior for the effective range term, as a proportion of the maximum distance in the data. Users should choose the proportion of distance at which they think the spatial correlation will become negligible. Must be greater than 0. Values greater than 1 are permitted, but users should recognize that this implies that meaningful spatial correlation would persist outside of the convex hull of data. Defaults to half the maximum distance, valued at 0.5.

beta.var

Prior for the variance on zero-meaned normal priors on the regression coefficients. Must be greater than 0. Defaults to 10.

range.tol

Tolerance term for setting the effective range. At the distance where the spatial correlation drops below this term, it is judged that the effective range has been met. The default value is the commonly-used 0.05. Must be greater than 0 and less than 1.

b.tune

Tuning parameter for candidate generation of regression coefficients that must be greater than 0. A value of 1 means that draws will be based on the variance-covariance matrix of coefficients from OLS. Larger steps are taken for values greater than 1, and smaller steps are taken for values from 0 to 1. Defaults to 1.0.

nugget.tune

Tuning parameter for candidate generation of the nugget term (tau2) that must be greater than 0. A value of 1 means that draws will be based on the typical variance of an inverse gamma distribution. Smaller steps are taken for values greater than 1, and larger steps are taken for decimal values from 0 to 1. Defaults to 10.0.

psill.tune

Tuning parameter for candidate generation of the partial sill term (sigma2) that must be greater than 0. A value of 1 means that draws will be based on the typical variance of an inverse gamma distribution. Smaller steps are taken for values greater than 1, and larger steps are taken for decimal values from 0 to 1. Defaults to 1.0.

distance.matrix

Logical value indicates whether to save the distance matrix in the output object. Saving distance matrix can save time for furthur use such as in update() function but may results in larger file size. Defaults to FALSE.

progress.bar

Types of progress bar. The default is "message" and will report variance terms. Other possible values are "TRUE" (simple percentage) and "FALSE" (suppress the progress bar).

accept.rate.warning

Logical values indicating whether to show the warnings when the acceptance rates are too high or too low. Defaults to TRUE.

Details

Analysts should use this function if they want to estimate a linear regression model in which each observation can be located at points in geographic space. That is, each observation is observed for a set of coordinates in eastings & northings or longitude & latitude.

Researchers must specify their model in the following manner: formula should be a symbolic description of the model to be fitted; it is similar to R model syntax as used in lm(). In addition, a matrix of coordinates must be specified for the geospatial model in coords. coords should be a matrix with two columns that specify west-east and north-south coordinates, respectively (ideally eastings and northings but possibly longitude and latitude). It can also be a vector of strings indicating the variables names of the coordinates in the data. data should be a data frame containing the variables in the model including both the formula and coordinates (if only the names are provided). Alternatively, users can also specify the variables using y, X, east, and north for outcome, explanatory, west-east coordinates, and north-south coordinates variables, respectively. This alternative specification is compatible with the one used in the early version of this package.

n.iter is the number of iterations to sample from the posterior distribution using the Metropolis-Hastings algorithm. This defaults to 100 iterations, but many more iterations would normally be preferred. n.burnin is set to 0 by default to preserve all the iterations since the kriging model usually takes a relatively long time to run. Users can set a number for burnin or use burnin function afterwards to discard the burnin period. The output of the function prints the proportion of candidate values for the coefficients and for the variance terms accepted by the Metropolis-Hastings algorithm. Particularly low or high acceptance rates respectively may indicate slow mixing (requiring more iterations) or a transient state (leading to nonconvergence), so additional messages will print for extreme acceptance rates. Users may want to adjust the tuning parameters b.tune, nugget.tune, or psill.tune, or perhaps the tolerance parameter range.tol if the acceptance rate is too high or too low.

The function returns a "krige" list object including the output MCMC matrix of sampled values from the posterior distribution as well as the record of function arguments, model frame, acceptance rates, and standing parameters. Users can use the generic summary function to summarize the results or extract the elements of the object for further use.

Value

An object of class krige that includes the output MCMC matrix of sampled values from the posterior distribution as well as the record of function arguments, model frame, acceptance rates, and standing parameters.

References

Jeff Gill. 2020. Measuring Constituency Ideology Using Bayesian Universal Kriging. State Politics & Policy Quarterly. doi:10.1177/1532440020930197

Examples

## Not run: 
# Summarize example data
summary(ContrivedData)

# Initial OLS model
contrived.ols<-lm(y~x.1+x.2,data=ContrivedData)
# summary(contrived.ols)

# Set seed
set.seed(1241060320)

#For simple illustration, we set to few iterations.
#In this case, a 10,000-iteration run converges to the true parameters.
#If you have considerable time and hardware, delete the # on the next line.
#10,000 iterations took 39 min. with 8 GB RAM & a 1.5 GHz Quad-Core processor.
M <- 100
#M<-10000

contrived.run <- metropolis.krige(y ~ x.1 + x.2, coords = c("s.1","s.2"), 
   data = ContrivedData, n.iter = M, n.burnin=20, range.tol = 0.05)
# Alternatively, use burnin() after estimation  
#contrived.run <- burnin(contrived.run, n.burnin=20)

# Summarize the results and examine results against true coefficients	
summary(contrived.run)
(TRUTH<-c(0.5,2.5,0.5,0,1,2))

# Extract the MCMC matrix of the posterior distribution
contrived.run.mat <- mcmc.samples(contrived.run)
head(contrived.run.mat)

# Diagnostics
geweke(contrived.run, early.prop=0.5)
heidel.welch(contrived.run)

# Semivariogram
### Semivariance
semivariance(contrived.run)
### Plot semivariogram
semivariogram(contrived.run)
### Alternatively, use generic plot() on a krige object
plot(contrived.run)

## End(Not run)


[Package krige version 0.6.2 Index]