mleGP {laGP} | R Documentation |
Inference for GP correlation parameters
Description
Maximum likelihood/a posteriori inference for (isotropic and separable) Gaussian lengthscale and nugget parameters, marginally or jointly, for Gaussian process regression
Usage
mleGP(gpi, param = c("d", "g"), tmin=sqrt(.Machine$double.eps),
tmax = -1, ab = c(0, 0), verb = 0)
mleGPsep(gpsepi, param=c("d", "g", "both"), tmin=rep(sqrt(.Machine$double.eps), 2),
tmax=c(-1,1), ab=rep(0,4), maxit=100, verb=0)
mleGPsep.R(gpsepi, param=c("d", "g"), tmin=sqrt(.Machine$double.eps),
tmax=-1, ab=c(0,0), maxit=100, verb=0)
jmleGP(gpi, drange=c(sqrt(.Machine$double.eps),10),
grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), verb=0)
jmleGP.R(gpi, N=100, drange=c(sqrt(.Machine$double.eps),10),
grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), verb=0)
jmleGPsep(gpsepi, drange=c(sqrt(.Machine$double.eps),10),
grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0),
maxit=100, verb=0)
jmleGPsep.R(gpsepi, N=100, drange=c(sqrt(.Machine$double.eps),10),
grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0),
maxit=100, mleGPsep=mleGPsep.R, verb=0)
Arguments
gpi |
a C-side GP object identifier (positive integer);
e.g., as returned by |
gpsepi |
similar to |
N |
for |
param |
for |
tmin |
for |
tmax |
for |
drange |
for |
grange |
for |
ab |
for |
maxit |
for |
dab |
for |
gab |
for |
mleGPsep |
function for internal MLE calculation of the separable
lengthscale parameter; one of either |
verb |
a verbosity argument indicating how much information about the optimization
steps should be printed to the screen; |
Details
mleGP
and mleGPsep
perform marginal (or profile) inference
for the specified param
, either the lengthscale or the nugget.
mleGPsep
can perform simultaneous lengthscale and nugget inference via
a common gradient with param = "both"
. More details are provided below.
For the lengthscale, mleGP
uses a Newton-like scheme with analytic first
and second derivatives (more below) to find the scalar parameter for the isotropic
Gaussian correlation function, with hard-coded 100-max iterations threshold and a
sqrt(.Machine$double.eps)
tolerance for determining convergence;
mleGPsep.R
uses L-BFGS-B via optim
for the
vectorized parameter of the separable Gaussian correlation, with a user-supplied
maximum number of iterations (maxit
) passed to optim
.
When maxit
is reached the output conv = 1
is returned,
subsequent identical calls to mleGPsep.R
can be used to continue with
further iterations. mleGPsep
is similar, but uses the C
entry point lbfgsb
.
For the nugget, both mleGP
and mleGPsep
utilize a (nearly
identical) Newton-like scheme leveraging first and second derivatives.
jmleGP
and jmleGPsep
provide joint inference by iterating
over the marginals of lengthscale and nugget. The jmleGP.R
function
is an R-only wrapper around
mleGP
(which is primarily in C), whereas jmleGP
is
primarily in C but with reduced output and
with hard-coded N=100
. The same is true for jmleGPsep
.
mleGPsep
provides a param = "both"
alternative to jmleGPsep
leveraging a common gradient. It can be helpful to supply a larger maxit
argument compared to jmleGPsep
as the latter may do up to 100 outer
iterations, cycling over lengthscale and nugget. mleGPsep
usually requires
many fewer total iterations, unless one of the lengthscale or nugget is
already converged. In anticipation of param = "both"
the
mleGPsep
function has longer default values for its bounds and prior
arguments. These longer arguments are
ignored when param != "both"
. At this time mleGP
does not have a
param = "both"
option.
All methods are initialized at the value of the parameter(s) currently
stored by the C-side object referenced by gpi
or gpsepi
. It is
highly recommended that sensible range values (tmin
, tmax
or drange
, grange
) be provided. The defaults provided are
too loose for most applications. As illustrated in the examples below,
the darg
and garg
functions can be used
to set appropriate ranges from the distributions of inputs and output
data respectively.
The Newton-like method implemented for the isotropic lengthscale and for the
nugget offers very fast convergence to local maxima, but sometimes it fails
to converge (for any of the usual reasons). The implementation
detects this, and in such cases it invokes a Brent_fmin
call instead -
this is the method behind the optimize
function.
Note that the gpi
or gpsepi
object(s) must have been allocated with
dK=TRUE
; alternatively, one can call buildKGP
or buildKGPsep
- however, this is not in the NAMESPACE at this time
Value
A self-explanatory data.frame
is returned containing the
values inferred and the number of iterations used. The
jmleGP.R
and jmleGPsep.R
functions will also show progress details (the values
obtained after each iteration over the marginals).
However, the most important “output” is the modified GP object which retains the setting of the parameters reported on output as a side effect.
mleGPsep
and jmleGPsep
provide an output field/column
called conv
indicating convergence (when 0), or alternately
a value agreeing with a non-convergence code provided on
output by optim
Author(s)
Robert B. Gramacy rbg@vt.edu
References
For standard GP inference, refer to any graduate text, e.g., Rasmussen & Williams Gaussian Processes for Machine Learning, or
Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 5.) https://bobby.gramacy.com/surrogates/
See Also
vignette("laGP")
,
newGP
, laGP
, llikGP
, optimize
Examples
## a simple example with estimated nugget
if(require("MASS")) {
## motorcycle data and predictive locations
X <- matrix(mcycle[,1], ncol=1)
Z <- mcycle[,2]
## get sensible ranges
d <- darg(NULL, X)
g <- garg(list(mle=TRUE), Z)
## initialize the model
gpi <- newGP(X, Z, d=d$start, g=g$start, dK=TRUE)
## separate marginal inference (not necessary - for illustration only)
print(mleGP(gpi, "d", d$min, d$max))
print(mleGP(gpi, "g", g$min, g$max))
## joint inference (could skip straight to here)
print(jmleGP(gpi, drange=c(d$min, d$max), grange=c(g$min, g$max)))
## plot the resulting predictive surface
N <- 100
XX <- matrix(seq(min(X), max(X), length=N), ncol=1)
p <- predGP(gpi, XX, lite=TRUE)
plot(X, Z, main="stationary GP fit to motorcycle data")
lines(XX, p$mean, lwd=2)
lines(XX, p$mean+1.96*sqrt(p$s2*p$df/(p$df-2)), col=2, lty=2)
lines(XX, p$mean-1.96*sqrt(p$s2*p$df/(p$df-2)), col=2, lty=2)
## clean up
deleteGP(gpi)
}
##
## with a separable correlation function
##
## 2D Example: GoldPrice Function, mimicking GP_fit package
f <- function(x)
{
x1 <- 4*x[,1] - 2
x2 <- 4*x[,2] - 2;
t1 <- 1 + (x1 + x2 + 1)^2*(19 - 14*x1 + 3*x1^2 - 14*x2 + 6*x1*x2 + 3*x2^2);
t2 <- 30 + (2*x1 -3*x2)^2*(18 - 32*x1 + 12*x1^2 + 48*x2 - 36*x1*x2 + 27*x2^2);
y <- t1*t2;
return(y)
}
## build design
library(tgp)
n <- 50 ## change to 100 or 1000 for more interesting demo
B <- rbind(c(0,1), c(0,1))
X <- dopt.gp(n, Xcand=lhs(10*n, B))$XX
## this differs from GP_fit in that we use the log response
Y <- log(f(X))
## get sensible ranges
d <- darg(NULL, X)
g <- garg(list(mle=TRUE), Y)
## build GP and jointly optimize via profile mehtods
gpisep <- newGPsep(X, Y, d=rep(d$start, 2), g=g$start, dK=TRUE)
jmleGPsep(gpisep, drange=c(d$min, d$max), grange=c(g$min, g$max))
## clean up
deleteGPsep(gpisep)
## alternatively, we can optimize via a combined gradient
gpisep <- newGPsep(X, Y, d=rep(d$start, 2), g=g$start, dK=TRUE)
mleGPsep(gpisep, param="both", tmin=c(d$min, g$min), tmax=c(d$max, g$max))
deleteGPsep(gpisep)