bigGP {bigGP} | R Documentation |
Package for Calculations with Big Gaussian Processes
Description
bigGP
is a collection of functions for doing distributed
calculations in the context of various kinds of Gaussian process
models, combined with a ReferenceClass, krigeProblem
,
for doing kriging calculations
based on maximum likelihood estimation.
Details
Full details on doing distributed kriging can be found in the help
page for krigeProblem
. For general calculations with
distributed vectors and matrices, including extending the package for
additional use cases beyond standard kriging, one first needs to
create the needed vectors and matrices in a distributed fashion on the
slave processes. To do this, the indices associated with relevant vectors and matrices need to be found for each slave process; see
localGetVectorIndices
. Then these indices need to be
used by user-created functions to create the pieces of the vectors and
matrices residing on each slave process; see
localKrigeProblemConstructMean
and
localKrigeProblemConstructMean
for examples. Following
this, one can use the various functions for distributed linear
algebra.
The functions provided for distributed linear algebra are:
remoteCalcChol
:calculates the Cholesky decomposition of a (numerically) positive definite matrix,
C = L L^{\top}
. Matrices that are not numerically positive.
definite will cause failure as pivoting is not implemented.
remoteForwardsolve
:does a forwardsolve using an already-calculated Cholesky factor into a vector or matrix,
L^{-1}Z
.remoteBacksolve
:does a backsolve using an already-calculated Cholesky factor into a vector or matrix,
L^{-\top}Z
.remoteMultChol
:multiplies and an already-calculated Cholesky factor by a vector or matrix,
LZ
.remoteCrossProdMatVec
:multiplies the transpose of a matrix by a vector,
X^{\top}z
.remoteCrossProdMatSelf
:does the crossproduct of a matrix,
X^{\top}X
.remoteCrossProdMatSelfDiag
:finds the diagonal of the crossproduct of a matrix,
\mbox{diag}(X^{\top}X)
.remoteConstructRnormVector
:generates a vector of random standard normal variables.
remoteConstructRnormMatrix
:generates a matrix of random standard normal variables.
remoteCalc
:does arbitrary calculations on one or two inputs.
Warnings
Note that the block replication factor, h
, needs to be
consistent in any given calculation. So if one is doing a
forwardsolve, the replication factor used in distributing the
original matrix (and therefore its Cholesky factor) should be the
same as that used in distributing the vector being solved into (or the rows
of the matrix being solved into).
Also note that when carrying out time-intensive calculations on the
slave processes, the slaves will not be responsive to additional
interaction, so commands such as remoteLs
may appear to
hang. This may occur because the slave process needs to finish a
previous calculation before responding.
Note that distributed vectors and distributed one-column
matrices are stored differently, with matrices stored with padded
columns. When using remoteForwardSolve
, remoteBacksolve
,
remoteMultChol
, you should use n2 = NULL
when the
second argument is a vector and n2 = 1
when the second column
is a one-column matrix.
Note that triangular and symmetric matrices are stored as vectors, column-major
order, of the lower triangular elements. To collect a distributed
symmetric matrix on the master process, one uses
collectTriangularMatrix
. collectTriangularMatrix
always
fills the upper triangle as the transpose of the lower triangle.
Author(s)
Christopher Paciorek and Benjamin Lipshitz, in collaboration with Tina Zhuo, Cari Kaufman, Rollin Thomas, and Prabhat.
Maintainer: Christopher Paciorek <paciorek@alumni.cmu.edu>
References
Paciorek, C.J., B. Lipshitz, W. Zhuo, Prabhat, C.G. Kaufman, and R.C. Thomas. 2015. Parallelizing Gaussian Process Calculations in R. Journal of Statistical Software, 63(10), 1-23. doi:10.18637/jss.v063.i10.
Paciorek, C.J., B. Lipshitz, W. Zhuo, Prabhat, C.G. Kaufman, and R.C. Thomas. 2013. Parallelizing Gaussian Process Calculations in R. arXiv:1305.4886. https://arxiv.org/abs/1305.4886.
See Also
See bigGP.init
for the necessary initialization steps, and
krigeProblem
for doing kriging based on maximum likelihood
estimation.
Examples
# this is an example of using the API to do distributed linear algebra
# for Gaussian process calculations; we'll demonstrate generating from
# a Gaussian process with exponential covariance; note that this can
# be done more easily through the krigeProblem ReferenceClass
## Not run:
bigGP.init(3)
params <- c(sigma2 = 1, rho = 0.25)
# for this example, we'll use a modest size problem, but to demo on a
# cluster, increase m to a larger value
m <- 80
gd <- seq(0, 1, length = m)
locns = expand.grid(x = gd, y = gd)
# indices will be a two column matrix with the index of the first set of
# locations in the first column and of the second set in the second column
covfunc <- function(params, locns, indices) {
dd <- sqrt( (locns$x[indices[,1]] - locns$x[indices[,2]])^2 +
(locns$y[indices[,1]] - locns$y[indices[,2]])^2 )
return(params["sigma2"] * exp(-dd / params["rho"]))
}
mpi.bcast.Robj2slave(params)
mpi.bcast.Robj2slave(covfunc)
mpi.bcast.Robj2slave(locns)
mpi.bcast.cmd(indices <- localGetTriangularMatrixIndices(nrow(locns)))
mpi.bcast.cmd(C <- covfunc(params, locns, indices))
remoteLs() # this may pause before reporting, as slaves are busy doing
# computations above
remoteCalcChol('C', 'L', n = m^2)
remoteConstructRnormVector('z', n = m^2)
remoteMultChol('L', 'z', 'x', n1 = m^2)
x <- collectVector('x', n = m^2)
image(gd, gd, matrix(x, m))
## End(Not run)