spmeshed {meshed} | R Documentation |
Posterior sampling for models based on MGPs
Description
Fits Bayesian multivariate spatial or spatiotemporal regression models with latent MGPs via Markov chain Monte Carlo.
Usage
spmeshed(y, x, coords, k=NULL,
family = "gaussian",
axis_partition = NULL,
block_size = 30,
grid_size = NULL,
grid_custom = NULL,
n_samples = 1000,
n_burnin = 100,
n_thin = 1,
n_threads = 4,
verbose = 0,
predict_everywhere = FALSE,
settings = list(adapting=TRUE, forced_grid=NULL,
cache=NULL, ps=TRUE, saving=TRUE, low_mem=FALSE, hmc=4),
prior = list(beta=NULL, tausq=NULL, sigmasq = NULL,
phi=NULL, a=NULL, nu = NULL,
toplim = NULL, btmlim = NULL, set_unif_bounds=NULL),
starting = list(beta=NULL, tausq=NULL, theta=NULL, lambda=NULL, v=NULL,
a=NULL, nu = NULL,
mcmcsd=.05,
mcmc_startfrom=0),
debug = list(sample_beta=TRUE, sample_tausq=TRUE,
sample_theta=TRUE, sample_w=TRUE, sample_lambda=TRUE,
verbose=FALSE, debug=FALSE),
indpart=FALSE
)
Arguments
y |
matrix of multivariate outcomes with |
x |
matrix of covariates with |
coords |
matrix of coordinates with |
k |
integer |
family |
a vector with length |
axis_partition |
integer vector of size |
block_size |
integer approximate size of the blocks after domain partitioning. Only used if |
grid_size |
integer vector of size |
grid_custom |
list with elements |
n_samples |
integer number of MCMC samples at which all the unknowns are stored (including the latent effects). |
n_burnin |
integer number of MCMC samples to discard at the beginning of the chain. |
n_thin |
integer thinning parameter for the MCMC chain. Only the chain of latent effects ( |
n_threads |
integer number of OpenMP threads. This is ineffective if |
verbose |
integer. If |
predict_everywhere |
bool used if settings$forced_grid=T. Should predictions be made at the reference grid locations? If not, predictions will be made only at the supplied NA values of Y. |
settings |
list: |
prior |
list: setup for priors of unknown parameters. |
starting |
list: setup for starting values of unknown parameters. |
debug |
list: setup for debugging things. Some parts of MCMC can be turned off here. |
indpart |
bool defaults to |
Details
This function targets the following model:
y(s) = x(s)^\top \beta + \Lambda v(s) + \epsilon(s),
where y(s)
is a q
-dimensional vector of outcomes at spatial location s
, x(s)
is a p
-dimensional vector of covariates with static coefficients \beta
, \Lambda
is a matrix of factor loadings of size (q, k)
, v(s)
is a k
-dimensional vector which collects the realization of independent Gaussian processes v_j \sim spmeshed(0, C_j)
for j=1, \dots, k
and where C_j(s, s')
is a correlation function. s
is a coordinate in space (d=2
) or space plus time (d=3
). The Meshed GP implemented here associates an axis-parallel tessellation of the domain to a cubic directed acyclic graph (mesh).
Value
coordsdata |
data.frame including the original |
savedata |
Available if |
yhat_mcmc |
list of length |
v_mcmc |
list of length |
w_mcmc |
list of length |
lp_mcmc |
list of length |
beta_mcmc |
array of size |
tausq_mcmc |
matrix of size |
theta_mcmc |
array of size |
lambda_mcmc |
array of size |
paramsd |
Cholesky factorization of the proposal covariance for adaptive MCMC, after adaptation. |
mcmc |
Total number of MCMC iterations performed. |
mcmc_time |
Time in seconds taken for MCMC (not including preprocessing). |
Author(s)
Michele Peruzzi michele.peruzzi@duke.edu
References
Peruzzi, M., Banerjee, S., and Finley, A.O. (2022) Highly Scalable Bayesian Geostatistical Modeling via Meshed Gaussian Processes on Partitioned Domains. Journal of the American Statistical Association, 117(538):969-982. doi:10.1080/01621459.2020.1833889
Peruzzi, M., Banerjee, S., Dunson, D.B., and Finley, A.O. (2021) Grid-Parametrize-Split (GriPS) for Improved Scalable Inference in Spatial Big Data Analysis. https://arxiv.org/abs/2101.03579
Peruzzi, M. and Dunson, D.B. (2022) Spatial meshing for general Bayesian multivariate models. https://arxiv.org/abs/2201.10080
Examples
# toy example with tiny dataset and short MCMC
# on a univariate outcome
library(magrittr)
library(dplyr)
library(ggplot2)
library(meshed)
set.seed(2021)
SS <- 12
n <- SS^2 # total n. locations, including missing ones
coords <- expand.grid(xx <- seq(0,1,length.out=SS), xx) %>%
as.matrix()
# generate data
sigmasq <- 2.3
phi <- 6
tausq <- .1
B <- c(-1,.5,1)
CC <- sigmasq * exp(-phi * as.matrix(dist(coords)))
LC <- t(chol(CC))
w <- LC %*% rnorm(n)
p <- length(B)
X <- rnorm(n * p) %>% matrix(ncol=p)
y_full <- X %*% B + w + tausq^.5 * rnorm(n)
set_missing <- rbinom(n, 1, 0.1)
simdata <- data.frame(coords,
y_full = y_full,
w_latent = w) %>%
mutate(y_observed = ifelse(set_missing==1, NA, y_full))
# MCMC setup
mcmc_keep <- 500
mcmc_burn <- 100
mcmc_thin <- 2
y <- simdata$y_observed
ybar <- mean(y, na.rm=TRUE)
meshout <- spmeshed(y-ybar, X, coords,
axis_partition=c(4,4),
n_samples = mcmc_keep,
n_burn = mcmc_burn,
n_thin = mcmc_thin,
prior=list(phi=c(1,15)),
verbose = 0,
n_threads = 1)
# posterior means
best_post_mean <- meshout$beta_mcmc %>% apply(1:2, mean)
# process means
wmesh <- data.frame(w_mgp = meshout$w_mcmc %>% summary_list_mean())
# predictions
ymesh <- data.frame(y_mgp = meshout$yhat_mcmc %>% summary_list_mean())
outdf <-
meshout$coordsdata %>%
cbind(ymesh, wmesh)
# plot predictions
pred_plot <- outdf %>%
ggplot(aes(Var1, Var2, color=y_mgp)) +
geom_point() +
scale_color_viridis_c()
# plot latent process
latent_plot <- outdf %>%
ggplot(aes(Var1, Var2, color=w_mgp)) +
geom_point() +
scale_color_viridis_c()
# estimation of regression coefficients
plot(density(meshout$beta_mcmc[1,1,]))
abline(v=B[1], col="red")