R: Gaussian process modeling with homoskedastic noise

mleHomGP {hetGP}

R Documentation

Gaussian process modeling with homoskedastic noise

Description

Gaussian process regression under homoskedastic noise based on maximum likelihood estimation of the hyperparameters. This function is enhanced to deal with replicated observations.

Usage

mleHomGP(
  X,
  Z,
  lower = NULL,
  upper = NULL,
  known = NULL,
  noiseControl = list(g_bounds = c(sqrt(.Machine$double.eps), 100)),
  init = NULL,
  covtype = c("Gaussian", "Matern5_2", "Matern3_2"),
  maxit = 100,
  eps = sqrt(.Machine$double.eps),
  settings = list(return.Ki = TRUE, factr = 1e+07)
)

Arguments

`X`	matrix of all designs, one per row, or list with elements: `X0` matrix of unique design locations, one point per row `Z0` vector of averaged observations, of length `nrow(X0)` `mult` number of replicates at designs in `X0`, of length `nrow(X0)`
`Z`	vector of all observations. If using a list with `X`, `Z` has to be ordered with respect to `X0`, and of length `sum(mult)`
`lower`, `upper`	optional bounds for the `theta` parameter (see `cov_gen` for the exact parameterization). In the multivariate case, it is possible to give vectors for bounds (resp. scalars) for anisotropy (resp. isotropy)
`known`	optional list of known parameters, e.g., `beta0`, `theta` or `g`
`noiseControl`	list with element , `g_bounds`, vector providing minimal and maximal noise to signal ratio
`init`	optional list specifying starting values for MLE optimization, with elements: `theta_init` initial value of the theta parameters to be optimized over (default to 10% of the range determined with `lower` and `upper`) `g_init` initial value of the nugget parameter to be optimized over (based on the variance at replicates if there are any, else `0.1`)
`covtype`	covariance kernel type, either 'Gaussian', 'Matern5_2' or 'Matern3_2', see `cov_gen`
`maxit`	maximum number of iteration for L-BFGS-B of `optim`
`eps`	jitter used in the inversion of the covariance matrix for numerical stability
`settings`	list with argument `return.Ki`, to include the inverse covariance matrix in the object for further use (e.g., prediction). Arguments `factr` (default to 1e9) and `pgtol` are available to be passed to `control` for L-BFGS-B in `optim` (for the joint likelihood only).

Details

The global covariance matrix of the model is parameterized as nu_hat * (C + g * diag(1/mult)) = nu_hat * K, with C the correlation matrix between unique designs, depending on the family of kernel used (see cov_gen for available choices) and values of lengthscale parameters. nu_hat is the plugin estimator of the variance of the process.

It is generally recommended to use find_reps to pre-process the data, to rescale the inputs to the unit cube and to normalize the outputs.

Value

a list which is given the S3 class "homGP", with elements:

theta: maximum likelihood estimate of the lengthscale parameter(s),
g: maximum likelihood estimate of the nugget variance,
trendtype: either "SK" if beta0 is given, else "OK"
beta0: estimated trend unless given in input,
nu_hat: plugin estimator of the variance,
ll: log-likelihood value,
X0, Z0, Z, mult, eps, covtype: values given in input,
call: user call of the function
used_args: list with arguments provided in the call
nit_opt, msg: counts and msg returned by optim
Ki: inverse covariance matrix (not scaled by nu_hat) (if return.Ki is TRUE in settings)
time: time to train the model, in seconds.

References

M. Binois, Robert B. Gramacy, M. Ludkovski (2018), Practical heteroskedastic Gaussian process modeling for large simulation experiments, Journal of Computational and Graphical Statistics, 27(4), 808–821.
Preprint available on arXiv:1611.05902.

Examples

##------------------------------------------------------------
## Example 1: Homoskedastic GP modeling on the motorcycle data
##------------------------------------------------------------
set.seed(32)

## motorcycle data
library(MASS)
X <- matrix(mcycle$times, ncol = 1)
Z <- mcycle$accel
plot(X, Z, ylim = c(-160, 90), ylab = 'acceleration', xlab = "time")

 
model <- mleHomGP(X = X, Z = Z, lower = 0.01, upper = 100)
  
## Display averaged observations
points(model$X0, model$Z0, pch = 20) 
xgrid <- matrix(seq(0, 60, length.out = 301), ncol = 1) 
predictions <- predict(x = xgrid, object =  model)

## Display mean prediction
lines(xgrid, predictions$mean, col = 'red', lwd = 2)
## Display 95% confidence intervals
lines(xgrid, qnorm(0.05, predictions$mean, sqrt(predictions$sd2)), col = 2, lty = 2)
lines(xgrid, qnorm(0.95, predictions$mean, sqrt(predictions$sd2)), col = 2, lty = 2)
## Display 95% prediction intervals
lines(xgrid, qnorm(0.05, predictions$mean, sqrt(predictions$sd2 + predictions$nugs)), 
  col = 3, lty = 2)
lines(xgrid, qnorm(0.95, predictions$mean, sqrt(predictions$sd2 + predictions$nugs)), 
  col = 3, lty = 2)

[Package hetGP version 1.1.6 Index]