GP_fit {GPfit} | R Documentation |
Gaussian Process Model fitting
Description
For an (n x d) design matrix, X
,
and the corresponding (n x 1) simulator output Y
,
this function fits the GP model and returns the parameter estimates.
The optimization routine assumes that
the inputs are scaled to the unit hypercube .
Usage
GP_fit(X, Y, control = c(200 * d, 80 * d, 2 * d), nug_thres = 20,
trace = FALSE, maxit = 100, corr = list(type = "exponential", power
= 1.95), optim_start = NULL)
Arguments
X |
the ( |
Y |
the ( |
control |
a vector of parameters used in the search for optimal beta (search grid size, percent, number of clusters). See ‘Details’. |
nug_thres |
a parameter used in computing the nugget. See ‘Details’. |
trace |
logical, if |
maxit |
the maximum number of iterations within |
corr |
a list of parameters for the specifing the correlation to be
used. See |
optim_start |
a matrix of potentially likely starting values for
correlation hyperparameters for the |
Details
This function fits the following GP model,
,
, where
is
a GP with mean 0,
, and
. Entries in covariance matrix R are determined by
corr
and parameterized by beta
, a d
-vector of
parameters. For computational stability is replaced with
, where
and
is the nugget parameter described in Ranjan et al.
(2011).
The parameter estimate beta
is obtained by minimizing
the deviance using a multi-start gradient based search (L-BFGS-B)
algorithm. The starting points are selected using the k-means
clustering algorithm on a large maximin LHD for values of
beta
, after discarding beta
vectors
with high deviance. The control
parameter determines the
quality of the starting points of the L-BFGS-B algoritm.
control
is a vector of three tunable parameters used
in the deviance optimization algorithm. The default values
correspond to choosing 2*d clusters (using k-means clustering
algorithm) based on 80*d best points (smallest deviance,
refer to GP_deviance
) from a 200*d - point
random maximin LHD in beta
. One can change these values
to balance the trade-off between computational cost and robustness
of likelihood optimization (or prediction accuracy).
For details see MacDonald et al. (2015).
The nug_thres
parameter is outlined in Ranjan et al. (2011) and is
used in finding the lower bound on the nugget
().
Value
an object of class GP
containing parameter estimates
beta
and sig2
, nugget parameter delta
, the data
(X
and Y
), and a specification of the correlation structure
used.
Author(s)
Blake MacDonald, Hugh Chipman, Pritam Ranjan
References
MacDonald, K.B., Ranjan, P. and Chipman, H. (2015). GPfit: An R
Package for Fitting a Gaussian Process Model to Deterministic Simulator
Outputs. Journal of Statistical Software, 64(12), 1-23.
http://www.jstatsoft.org/v64/i12/
Ranjan, P., Haynes, R., and Karsten, R. (2011). A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data, Technometrics, 53(4), 366 - 378.
See Also
plot
for plotting in 1 and 2 dimensions;
predict
for predicting the response and error surfaces;
optim
for information on the L-BFGS-B procedure;
GP_deviance
for computing the deviance.
Examples
## 1D Example 1
n = 5
d = 1
computer_simulator <- function(x){
x = 2 * x + 0.5
y = sin(10 * pi * x) / (2 * x) + (x - 1)^4
return(y)
}
set.seed(3)
library(lhs)
x = maximinLHS(n, d)
y = computer_simulator(x)
GPmodel = GP_fit(x, y)
print(GPmodel)
## 1D Example 2
n = 7
d = 1
computer_simulator <- function(x) {
y <- log(x + 0.1) + sin(5 * pi * x)
return(y)
}
set.seed(1)
library(lhs)
x = maximinLHS(n, d)
y = computer_simulator(x)
GPmodel = GP_fit(x, y)
print(GPmodel, digits = 4)
## 2D Example: GoldPrice Function
computer_simulator <- function(x) {
x1 = 4 * x[, 1] - 2
x2 = 4 * x[, 2] - 2
t1 = 1 + (x1 + x2 + 1)^2 * (19 - 14 * x1 + 3 * x1^2 - 14 * x2 +
6 * x1 *x2 + 3 * x2^2)
t2 = 30 + (2 * x1 - 3 * x2)^2 * (18 - 32 * x1 + 12 * x1^2 + 48 * x2 -
36 * x1 * x2 + 27 * x2^2)
y = t1 * t2
return(y)
}
n = 30
d = 2
set.seed(1)
library(lhs)
x = maximinLHS(n, d)
y = computer_simulator(x)
GPmodel = GP_fit(x, y)
print(GPmodel)