rmeshedgp {meshed} | R Documentation |
Prior sampling from a Meshed Gaussian Process
Description
Generates samples from a (univariate) MGP assuming a cubic directed acyclic graph and axis-parallel domain partitioning.
Usage
rmeshedgp(coords, theta,
axis_partition = NULL, block_size = 100,
n_threads=1, cache=TRUE, verbose=FALSE, debug=FALSE)
Arguments
coords |
matrix of spatial or spatiotemporal coordinates with |
theta |
vector with covariance parameters. If |
axis_partition |
integer vector of length |
block_size |
integer specifying the (approximate) size of the blocks, i.e. how many spatial or spatiotemporal locations should be included in each block. Note: larger values correspond to an MGP that is closer to a full GP, but require more expensive computations. |
n_threads |
integer number of OpenMP threads. This is ineffective if |
cache |
bool: whether to use cache. Some computational speedup is associated to |
verbose |
bool: print some messages. |
debug |
bool: print more messages. |
Details
Gaussian processes (GPs) lack in scalability to big datasets due to the assumed unrestricted dependence across the spatial or spatiotemporal domain.
Meshed GPs instead use a directed acyclic graph (DAG) with patterns, called mesh, to simplify the dependence structure across the domain. Each DAG node corresponds to a partition of the domain. MGPs can be interpreted as approximating the GP they originate from, or as standalone processes that can be sampled from. This function samples random MGPs and can thus be used to generate big spatial or spatiotemporal data.
The only requirement to sample from a MGP compared to a standard GP is the specification of the domain partitioning strategy. Here, either axis_partition
or block_size
can be used; the default block_size=100
can be used to quickly sample smooth surfaces at millions of locations.
Just like in a standard GP, one needs a covariance function or kernel which can be set as follows.
For spatial data (d=2
), the length of theta
determines which model is used (see above). Letting h = \| s-s' \|
where s
and s'
are locations in the spatial domain, the exponential covariance is defined as:
C(h) = \sigma^2 \exp \{ - \phi h \},
whereas the Matern model is
C(h) = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)} \phi^{\nu} h^{\nu} K_{\nu} ( \phi h ),
where K_{\nu}
is the modified Bessel function of the second kind of order \nu
.
For spatiotemporal data (d=3
) the covariance function between locations (s, t)
and (s', t')
with distance h = \| s-s' \|
and time lag u = \| t-t' \|
is defined as
C(h, u) = \sigma^2 / (a u + 1) \exp \{ -\phi h (a u + 1)^{-b/2} \},
which is a special case of non-separable spacetime covariance as introduced by Gneiting (2002).
Value
data.frame with the (reordered) supplied coordinates in the first d
columns, and the MGP sample in the last column, labeled w
.
Author(s)
Michele Peruzzi <michele.peruzzi@duke.edu>
References
Gneiting, T (2002) Nonseparable, Stationary Covariance Functions for Space-Time Data. Journal of the American Statistical Association. doi:10.1198/016214502760047113
Peruzzi, M., Banerjee, S., and Finley, A.O. (2020) Highly Scalable Bayesian Geostatistical Modeling via Meshed Gaussian Processes on Partitioned Domains. Journal of the American Statistical Association, in press. doi:10.1080/01621459.2020.1833889
Examples
library(ggplot2)
library(magrittr)
library(meshed)
# spatial domain (we choose a grid to make a nice image later)
# this generates a dataset of size 6400
xx <- seq(0, 1, length.out=80)
coords <- expand.grid(xx, xx) %>%
as.matrix()
raster_plot <- function(df){
ggplot(df, aes(Var1, Var2, fill=w)) +
geom_raster() +
scale_fill_viridis_c() +
theme_minimal() }
# spatial data, exponential covariance
# phi=14, sigma^2=2
simdata <- rmeshedgp(coords, c(14, 2))
raster_plot(simdata)
# spatial data, matern covariance
# phi=14, nu=1, sigma^2=2
simdata <- rmeshedgp(coords, c(14, 1, 2))
raster_plot(simdata)
# spacetime data, gneiting's covariance
# 64000 locations
stcoords <- expand.grid(xx, xx, seq(0, 1, length.out=10))
# it should take less than a couple of seconds
simdata <- rmeshedgp(stcoords, c(1, 14, .5, 2))
# plot data at 7th time period
raster_plot(simdata %>% dplyr::filter(Var3==unique(Var3)[7]))