R: Function for Fitting Univariate Bayesian Spatial Regression...

spNNGP {spNNGP}

R Documentation

Function for Fitting Univariate Bayesian Spatial Regression Models

Description

The function spNNGP fits Gaussian and non-Gaussian univariate Bayesian spatial regression models using Nearest Neighbor Gaussian Processes (NNGP).

Usage

spNNGP(formula, data = parent.frame(), coords, method = "response",
      family="gaussian", weights, n.neighbors = 15, 
      starting, tuning, priors, cov.model = "exponential",
      n.samples, n.omp.threads = 1, search.type = "cb", ord,
      return.neighbor.info = FALSE, neighbor.info,
      fit.rep = FALSE, sub.sample, verbose = TRUE, n.report = 100, ...)

Arguments

`formula`	a symbolic description of the regression model to be fit. See example below.
`data`	an optional data frame containing the variables in the model. If not found in data, the variables are taken from `environment(formula)`, typically the environment from which `spNNGP` is called.
`coords`	an `n \times 2` matrix of the observation coordinates in `R^2` (e.g., easting and northing), or if `data` is a data frame then `coords` can be a vector of length two comprising coordinate column names or indices. There can be no duplicate locations.
`method`	a quoted keyword that specifies the NNGP sampling algorithm. Supported method keywords are: `"response"` and `"latent"`. When `n` is large, the `"response"` algorithm should be faster and provide finer control over Metropolis acceptance rate for covariance parameters. In general, unless estimates of spatial random effects are needed, the `"response"` algorithm should be used.
`family`	a quoted keyword that specifies the data likelihood. Choices are "gaussian" for continuous outcome and "binomial" for discrete outcome which assumes a logistic link modeled using Polya-Gamma latent variables.
`weights`	specifies the number of trials for each observation when `family="binomial"`. The default is 1 trial for each observation. Valid arguments are a scalar value that specifies the number of trials used if all observations have the same number of trials, and a vector of length `n` that specifies the number of trials for each observation used there are differences in the number of trials.
`n.neighbors`	number of neighbors used in the NNGP.
`starting`	a list with each tag corresponding to a parameter name. Valid tags are `beta`, `sigma.sq`, `tau.sq`, `phi`, and `nu`. `nu` is only specified if `cov.model="matern"`. The value portion of each tag is the parameter's startingvalue.
`tuning`	a list with each tag corresponding to a parameter name. Valid tags are `sigma.sq`, `tau.sq`, `phi`, and `nu`. If `method="latent"` then only `phi` and `nu` need to be specified. The value portion of each tag defines the variance of the Metropolis sampler Normal proposal distribution.
`priors`	a list with each tag corresponding to a parameter name. Valid tags are `sigma.sq.ig`, `tau.sq.ig`, `phi.unif`, and `nu.unif`. Variance parameters, `simga.sq` and `tau.sq`, are assumed to follow an inverse-Gamma distribution, whereas the spatial decay `phi` and smoothness `nu` parameters are assumed to follow Uniform distributions. The hyperparameters of the inverse-Gamma are passed as a vector of length two, with the first and second elements corresponding to the shape and scale, respectively. The hyperparameters of the Uniform are also passed as a vector of length two with the first and second elements corresponding to the lower and upper support, respectively.
`cov.model`	a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the observations. Supported covariance model key words are: `"exponential"`, `"matern"`, `"spherical"`, and `"gaussian"`. See below for details.
`n.samples`	the number of posterior samples to collect.
`n.omp.threads`	a positive integer indicating the number of threads to use for SMP parallel processing. The package must be compiled for OpenMP support. For most Intel-based machines, we recommend setting `n.omp.threads` up to the number of hyperthreaded cores. Note, `n.omp.threads` > 1 might not work on some systems.
`fit.rep`	if `TRUE`, regression fitted and replicate data will be returned. The argument `sub.sample` controls which MCMC samples are used to generate the fitted and replicated data.
`sub.sample`	an optional list that specifies the samples used for `fit.rep`. Valid tags are `start`, `end`, and `thin`. Given the value associated with the tags, the sample subset is selected using `seq(as.integer(start), as.integer(end), by=as.integer(thin))`. The default values are `start=floor(0.5*n.samples)`, `end=n.samples` and `thin=1`.
`verbose`	if `TRUE`, model specification and progress of the sampler is printed to the screen. Otherwise, nothing is printed to the screen.
`search.type`	a quoted keyword that specifies type of nearest neighbor search algorithm. Supported method key words are: `"cb"` and `"brute"`. The `"cb"` should generally be much faster. If locations do not have identical coordinate values on the axis used for the nearest neighbor ordering (see `ord` argument) then `"cb"` and `"brute"` should produce identical neighbor sets. However, if there are identical coordinate values on the axis used for nearest neighbor ordering, then `"cb"` and `"brute"` might produce different, but equally valid, neighbor sets, e.g., if data are on a grid.
`ord`	an index vector of length `n` used for the nearest neighbor search. Internally, this vector is used to order `coords`, i.e., `coords[ord,]`, and associated data. Nearest neighbor candidates for the i-th row in the ordered `coords` are rows `1:(i-1)`, with the `n.neighbors` nearest neighbors being those with the minimum euclidean distance to the location defined by ordered `coords[i,]`. The default used when `ord` is not specified is x-axis ordering, i.e., `order(coords[,1])`. This argument should typically be left blank. This argument will be ignored if the `neighbor.info` argument is used.
`return.neighbor.info`	if `TRUE`, a list called `neighbor.info` containing several data structures used for fitting the NNGP model is returned. If there is no change in input data or `n.neighbors`, this list can be passed to subsequent `spNNGP` calls via the `neighbor.info` argument to avoid the neighbor search, which can be time consuming if `n` is large. In addition to the several cryptic data structures in `neighbor.info` there is a list called `n.indx` that is of length `n`. The i-th element in `n.indx` corresponds to the i-th row in `coords[ord,]` and holds the vector of that location's nearest neighbor indices. This list can be useful for plotting the neighbor graph if desired.
`neighbor.info`	see the `return.neighbor.info` argument description above.
`n.report`	the interval to report Metropolis sampler acceptance and MCMC progress.
`...`	currently no additional arguments.

Details

Model parameters can be fixed at their starting values by setting their tuning values to zero.

The no nugget model is specified by setting tau.sq to zero in the starting and tuning lists.

Value

An object of class NNGP with additional class designations for method and family. The return object is a list comprising:

`p.beta.samples`	a `coda` object of posterior samples for the regression coefficients.
`p.theta.samples`	a `coda` object of posterior samples for covariance parameters.
`p.w.samples`	is a matrix of posterior samples for the spatial random effects, where rows correspond to locations in `coords` and columns hold the `n.samples` posterior samples. This is only returned if `method="latent"`.
`y.hat.samples`	if `fit.rep=TRUE`, regression fitted values from posterior samples specified using `sub.sample`. See additional details below.
`y.hat.quants`	if `fit.rep=TRUE`, 0.5, 0.025, and 0.975 quantiles of the `y.hat.samples`.
`y.rep.samples`	if `fit.rep=TRUE`, replicated outcome from posterior samples specified using `sub.sample`. See additional details below.
`y.rep.quants`	if `fit.rep=TRUE`, 0.5, 0.025, and 0.975 quantiles of the `y.rep.samples`.
`s.indx`	if `fit.rep=TRUE`, the subset index specified with `sub.sample`.
`neighbor.info`	returned if `return.neighbor.info=TRUE` see the `return.neighbor.info` argument description above.
`run.time`	execution time for parameter estimation reported using `proc.time()`. This time does not include nearest neighbor search time for building the neighbor set.

The return object will include additional objects used for subsequent prediction and/or model fit evaluation.

Author(s)

Andrew O. Finley finleya@msu.edu,
Abhirup Datta abhidatta@jhu.edu,
Sudipto Banerjee sudipto@ucla.edu

References

Finley, A.O., A. Datta, S. Banerjee (2022) spNNGP R Package for Nearest Neighbor Gaussian Process Models. Journal of Statistical Software, doi: 10.18637/jss.v103.i05.

Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, doi: 10.1080/01621459.2015.1044091.

Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, doi: 10.1080/10618600.2018.1537924.

Examples


rmvn <- function(n, mu=0, V = matrix(1)){
  p <- length(mu)
  if(any(is.na(match(dim(V),p))))
    stop("Dimension problem!")
  D <- chol(V)
  t(matrix(rnorm(n*p), ncol=p)%*%D + rep(mu,rep(n,p)))
}

##Make some data
set.seed(1)
n <- 100
coords <- cbind(runif(n,0,1), runif(n,0,1))

x <- cbind(1, rnorm(n))

B <- as.matrix(c(1,5))

sigma.sq <- 5
tau.sq <- 1
phi <- 3/0.5

D <- as.matrix(dist(coords))
R <- exp(-phi*D)
w <- rmvn(1, rep(0,n), sigma.sq*R)
y <- rnorm(n, x%*%B + w, sqrt(tau.sq))

##Fit a Response and Latent NNGP model
n.samples <- 500

starting <- list("phi"=phi, "sigma.sq"=5, "tau.sq"=1)

tuning <- list("phi"=0.5, "sigma.sq"=0.5, "tau.sq"=0.5)

priors <- list("phi.Unif"=c(3/1, 3/0.01), "sigma.sq.IG"=c(2, 5), "tau.sq.IG"=c(2, 1))

cov.model <- "exponential"

m.s <- spNNGP(y~x-1, coords=coords, starting=starting, method="latent", n.neighbors=10,
              tuning=tuning, priors=priors, cov.model=cov.model,
              n.samples=n.samples, n.omp.threads=1)

summary(m.s)
plot(apply(m.s$p.w.samples, 1, median), w)

m.r <- spNNGP(y~x-1, coords=coords, starting=starting, method="response", n.neighbors=10,
              tuning=tuning, priors=priors, cov.model=cov.model,
              n.samples=n.samples, n.omp.threads=1)

summary(m.r)

##Fit with some user defined neighbor ordering

##ord <- order(coords[,2]) ##y-axis 
ord <- order(coords[,1]+coords[,2]) ##x+y-axis
##ord <- sample(1:n, n) ##random

m.r.xy <- spNNGP(y~x-1, coords=coords, starting=starting, method="response", n.neighbors=10,
              tuning=tuning, priors=priors, cov.model=cov.model,
              ord=ord, return.neighbor.info=TRUE,
              n.samples=n.samples, n.omp.threads=1)

summary(m.r.xy)

## Not run: 
##Visualize the neighbor sets and ordering constraint
n.indx <- m.r.xy$neighbor.info$n.indx
ord <- m.r.xy$neighbor.info$ord

##This is how the data are ordered internally for model fitting
coords.ord <- coords[ord,]

for(i in 1:n){

    plot(coords.ord, cex=1, xlab="Easting", ylab="Northing")
    points(coords.ord[i,,drop=FALSE], col="blue", pch=19, cex=1)
    points(coords.ord[n.indx[[i]],,drop=FALSE], col="red", pch=19, cex=1)

    readline(prompt = "Pause. Press <Enter> to continue...")
}

## End(Not run)

[Package spNNGP version 1.0.0 Index]