R: Gaussian Process Model fitting

GP_fit {GPfit}

R Documentation

Gaussian Process Model fitting

Description

For an (n x d) design matrix, X, and the corresponding (n x 1) simulator output Y, this function fits the GP model and returns the parameter estimates. The optimization routine assumes that the inputs are scaled to the unit hypercube [0,1]^d.

Usage

GP_fit(X, Y, control = c(200 * d, 80 * d, 2 * d), nug_thres = 20,
  trace = FALSE, maxit = 100, corr = list(type = "exponential", power
  = 1.95), optim_start = NULL)

Arguments

`X`	the (`n x d`) design matrix
`Y`	the (`n x 1`) vector of simulator outputs.
`control`	a vector of parameters used in the search for optimal beta (search grid size, percent, number of clusters). See ‘Details’.
`nug_thres`	a parameter used in computing the nugget. See ‘Details’.
`trace`	logical, if `TRUE`, will provide information on the `optim` runs
`maxit`	the maximum number of iterations within `optim`, defaults to 100
`corr`	a list of parameters for the specifing the correlation to be used. See `corr_matrix`.
`optim_start`	a matrix of potentially likely starting values for correlation hyperparameters for the `optim` runs, i.e., initial guess of the d-vector `beta`

Details

This function fits the following GP model, y(x) = \mu + Z(x), x \in [0,1]^{d}, where Z(x) is a GP with mean 0, Var(Z(x_i)) = \sigma^2, and Cov(Z(x_i),Z(x_j)) = \sigma^2R_{ij}. Entries in covariance matrix R are determined by corr and parameterized by beta, a d-vector of parameters. For computational stability R^{-1} is replaced with R_{\delta_{lb}}^{-1}, where R_{\delta{lb}} = R + \delta_{lb}I and \delta_{lb} is the nugget parameter described in Ranjan et al. (2011).

The parameter estimate beta is obtained by minimizing the deviance using a multi-start gradient based search (L-BFGS-B) algorithm. The starting points are selected using the k-means clustering algorithm on a large maximin LHD for values of beta, after discarding beta vectors with high deviance. The control parameter determines the quality of the starting points of the L-BFGS-B algoritm.

control is a vector of three tunable parameters used in the deviance optimization algorithm. The default values correspond to choosing 2*d clusters (using k-means clustering algorithm) based on 80*d best points (smallest deviance, refer to GP_deviance) from a 200*d - point random maximin LHD in beta. One can change these values to balance the trade-off between computational cost and robustness of likelihood optimization (or prediction accuracy). For details see MacDonald et al. (2015).

The nug_thres parameter is outlined in Ranjan et al. (2011) and is used in finding the lower bound on the nugget (\delta_{lb}).

Value

an object of class GP containing parameter estimates beta and sig2, nugget parameter delta, the data (X and Y), and a specification of the correlation structure used.

Author(s)

Blake MacDonald, Hugh Chipman, Pritam Ranjan

References

MacDonald, K.B., Ranjan, P. and Chipman, H. (2015). GPfit: An R Package for Fitting a Gaussian Process Model to Deterministic Simulator Outputs. Journal of Statistical Software, 64(12), 1-23. http://www.jstatsoft.org/v64/i12/

Ranjan, P., Haynes, R., and Karsten, R. (2011). A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data, Technometrics, 53(4), 366 - 378.

Examples


## 1D Example 1
n = 5
d = 1 
computer_simulator <- function(x){
    x = 2 * x + 0.5
    y = sin(10 * pi * x) / (2 * x) + (x - 1)^4
    return(y)
}
set.seed(3)
library(lhs)
x = maximinLHS(n, d)
y = computer_simulator(x)
GPmodel = GP_fit(x, y)
print(GPmodel)


## 1D Example 2
n = 7
d = 1
computer_simulator <- function(x) {
    y <- log(x + 0.1) + sin(5 * pi * x)
    return(y)
}
set.seed(1)
library(lhs)
x = maximinLHS(n, d)
y = computer_simulator(x)
GPmodel = GP_fit(x, y)
print(GPmodel, digits = 4)


## 2D Example: GoldPrice Function
computer_simulator <- function(x) {
    x1 = 4 * x[, 1] - 2
    x2 = 4 * x[, 2] - 2
    t1 = 1 + (x1 + x2 + 1)^2 * (19 - 14 * x1 + 3 * x1^2 - 14 * x2 + 
        6 * x1 *x2 + 3 * x2^2)
    t2 = 30 + (2 * x1 - 3 * x2)^2 * (18 - 32 * x1 + 12 * x1^2 + 48 * x2 - 
        36 * x1 * x2 + 27 * x2^2)
    y = t1 * t2
    return(y)
}
n = 30
d = 2
set.seed(1)
library(lhs)
x = maximinLHS(n, d) 
y = computer_simulator(x)
GPmodel = GP_fit(x, y)
print(GPmodel)

[Package GPfit version 1.0-8 Index]