predict.spmeshed {meshed} | R Documentation |
Posterior predictive sampling for models based on MGPs
Description
Sample from the posterior predictive distribution of the outcomes at new spatial or spatiotemporal locations after MCMC.
Usage
## S3 method for class 'spmeshed'
predict(object, newx, newcoords,
n_threads=4, verbose=FALSE, ...)
Arguments
object |
Object output from |
newx |
matrix of covariate values at the new coordinates. |
newcoords |
matrix of new coordinates. |
n_threads |
integer number of OpenMP threads. This is ineffective if |
verbose |
boolean for progress messagging. |
... |
other arguments (unused). |
Details
While this function can always be used to make predictions, in most cases it is more efficient to just include the prediction locations in the main data as NA
values; spmeshed
will sample from the posterior predictive distribution at those locations while doing MCMC. The predict
method is only recommended when all 4 of the following are true:
(1) spmeshed
was run with settings$forced_grid=FALSE
and
(2) the prediction locations are uniformly scattered on the domain (or rather, they are not clustered as a large empty area) and
(3) the number of prediction locations is a large portion of the number of observed data points and
(4) the prediction locations are not on a grid.
In all other cases the main spmeshed
function is setup to be more efficient in automatically performing predictions during MCMC.
Value
coords_out |
matrix with the prediction location coordinates (order updated after predictions). |
preds_out |
array of dimension ( |
Author(s)
Michele Peruzzi michele.peruzzi@duke.edu
References
Peruzzi, M., Banerjee, S., and Finley, A.O. (2020) Highly Scalable Bayesian Geostatistical Modeling via Meshed Gaussian Processes on Partitioned Domains. Journal of the American Statistical Association, in press. doi:10.1080/01621459.2020.1833889
Peruzzi, M., Banerjee, S., Dunson, D.B., and Finley, A.O. (2021) Grid-Parametrize-Split (GriPS) for Improved Scalable Inference in Spatial Big Data Analysis. https://arxiv.org/abs/2101.03579
Examples
# toy example with tiny dataset and short MCMC
# on a univariate outcome
library(magrittr)
library(dplyr)
library(meshed)
set.seed(2021)
SS <- 12
n <- SS^2 # total n. locations, including missing ones
coords <- data.frame(Var1=runif(n), Var2=runif(n)) %>%
as.matrix()
# generate data
sigmasq <- 2.3
phi <- 6
tausq <- .1
B <- c(-1,.5,1)
CC <- sigmasq * exp(-phi * as.matrix(dist(coords)))
LC <- t(chol(CC))
w <- LC %*% rnorm(n)
p <- length(B)
X <- rnorm(n * p) %>% matrix(ncol=p)
y_full <- X %*% B + w + tausq^.5 * rnorm(n)
set_missing <- rbinom(n, 1, 0.1)
simdata <- data.frame(coords,
y_full = y_full,
w_latent = w) %>%
mutate(y_observed = ifelse(set_missing==1, NA, y_full))
# MCMC setup
mcmc_keep <- 500
mcmc_burn <- 100
mcmc_thin <- 2
y <- simdata$y_observed
ybar <- mean(y, na.rm=TRUE)
# training set
y_in <- (y-ybar)[!is.na(y)]
X_in <- X[!is.na(y),]
coords_in <- coords[!is.na(y),]
# suppose we dont want to have gridded knots
# i.e. we are fixing the MGP reference set at the observed locations
# (this may be inefficient in big data settings)
meshout <- spmeshed(y_in, X_in, coords_in,
axis_partition=c(4,4),
n_samples = mcmc_keep,
n_burn = mcmc_burn,
n_thin = mcmc_thin,
settings = list(forced_grid=FALSE, cache=FALSE),
prior=list(phi=c(1,15)),
verbose = 0,
n_threads = 1)
# test set
coords_out <- coords[is.na(y),]
X_out <- X[is.na(y),]
df_predict <- predict(meshout, newx=X_out, newcoords=coords_out)
y_posterior_predictive_mean <- df_predict$preds_out[,1,] %>%
apply(1, mean) %>% add(ybar)
df_predicted <- df_predict$coords_out %>% cbind(y_posterior_predictive_mean)