R: Package for Calculations with Big Gaussian Processes

bigGP {bigGP}

R Documentation

Package for Calculations with Big Gaussian Processes

Description

bigGP is a collection of functions for doing distributed calculations in the context of various kinds of Gaussian process models, combined with a ReferenceClass, krigeProblem, for doing kriging calculations based on maximum likelihood estimation.

Details

Full details on doing distributed kriging can be found in the help page for krigeProblem. For general calculations with distributed vectors and matrices, including extending the package for additional use cases beyond standard kriging, one first needs to create the needed vectors and matrices in a distributed fashion on the slave processes. To do this, the indices associated with relevant vectors and matrices need to be found for each slave process; see localGetVectorIndices. Then these indices need to be used by user-created functions to create the pieces of the vectors and matrices residing on each slave process; see localKrigeProblemConstructMean and localKrigeProblemConstructMean for examples. Following this, one can use the various functions for distributed linear algebra.

The functions provided for distributed linear algebra are:

remoteCalcChol:: calculates the Cholesky decomposition of a (numerically) positive definite matrix, C = L L^{\top}. Matrices that are not numerically positive.

definite will cause failure as pivoting is not implemented.

remoteForwardsolve:: does a forwardsolve using an already-calculated Cholesky factor into a vector or matrix, L^{-1}Z.
remoteBacksolve:: does a backsolve using an already-calculated Cholesky factor into a vector or matrix, L^{-\top}Z.
remoteMultChol:: multiplies and an already-calculated Cholesky factor by a vector or matrix, LZ.
remoteCrossProdMatVec:: multiplies the transpose of a matrix by a vector, X^{\top}z.
remoteCrossProdMatSelf:: does the crossproduct of a matrix, X^{\top}X.
remoteCrossProdMatSelfDiag:: finds the diagonal of the crossproduct of a matrix, \mbox{diag}(X^{\top}X).
remoteConstructRnormVector:: generates a vector of random standard normal variables.
remoteConstructRnormMatrix:: generates a matrix of random standard normal variables.
remoteCalc:: does arbitrary calculations on one or two inputs.

Warnings

Note that the block replication factor, h, needs to be consistent in any given calculation. So if one is doing a forwardsolve, the replication factor used in distributing the original matrix (and therefore its Cholesky factor) should be the same as that used in distributing the vector being solved into (or the rows of the matrix being solved into).

Also note that when carrying out time-intensive calculations on the slave processes, the slaves will not be responsive to additional interaction, so commands such as remoteLs may appear to hang. This may occur because the slave process needs to finish a previous calculation before responding.

Note that distributed vectors and distributed one-column matrices are stored differently, with matrices stored with padded columns. When using remoteForwardSolve, remoteBacksolve, remoteMultChol, you should use n2 = NULL when the second argument is a vector and n2 = 1 when the second column is a one-column matrix.

Note that triangular and symmetric matrices are stored as vectors, column-major order, of the lower triangular elements. To collect a distributed symmetric matrix on the master process, one uses collectTriangularMatrix. collectTriangularMatrix always fills the upper triangle as the transpose of the lower triangle.

Author(s)

Christopher Paciorek and Benjamin Lipshitz, in collaboration with Tina Zhuo, Cari Kaufman, Rollin Thomas, and Prabhat.

Maintainer: Christopher Paciorek <paciorek@alumni.cmu.edu>

References

Paciorek, C.J., B. Lipshitz, W. Zhuo, Prabhat, C.G. Kaufman, and R.C. Thomas. 2015. Parallelizing Gaussian Process Calculations in R. Journal of Statistical Software, 63(10), 1-23. doi:10.18637/jss.v063.i10.

Paciorek, C.J., B. Lipshitz, W. Zhuo, Prabhat, C.G. Kaufman, and R.C. Thomas. 2013. Parallelizing Gaussian Process Calculations in R. arXiv:1305.4886. https://arxiv.org/abs/1305.4886.

Examples

# this is an example of using the API to do distributed linear algebra
#    for Gaussian process calculations; we'll demonstrate generating from
#    a Gaussian process with exponential covariance; note that this can
#    be done more easily through the krigeProblem ReferenceClass
## Not run: 
bigGP.init(3)
params <- c(sigma2 = 1, rho = 0.25)
# for this example, we'll use a modest size problem, but to demo on a
#    cluster, increase m to a larger value
m <- 80 
gd <- seq(0, 1, length = m)
locns = expand.grid(x = gd, y = gd)
# indices will be a two column matrix with the index of the first set of
#  locations in the first column and of the second set in the second column
covfunc <- function(params, locns, indices) {
dd <- sqrt( (locns$x[indices[,1]] - locns$x[indices[,2]])^2 +
(locns$y[indices[,1]] - locns$y[indices[,2]])^2 )
return(params["sigma2"] * exp(-dd / params["rho"]))
}
mpi.bcast.Robj2slave(params)
mpi.bcast.Robj2slave(covfunc)
mpi.bcast.Robj2slave(locns)
mpi.bcast.cmd(indices <- localGetTriangularMatrixIndices(nrow(locns)))
mpi.bcast.cmd(C <- covfunc(params, locns, indices))
remoteLs() # this may pause before reporting, as slaves are busy doing
#    computations above
remoteCalcChol('C', 'L', n = m^2)
remoteConstructRnormVector('z', n = m^2)
remoteMultChol('L', 'z', 'x', n1 = m^2)
x <- collectVector('x', n = m^2)
image(gd, gd, matrix(x, m))

## End(Not run)

[Package bigGP version 0.1.8 Index]