R: Student-t process modeling with heteroskedastic noise

mleHetTP {hetGP}

R Documentation

Student-t process modeling with heteroskedastic noise

Description

Student-t process regression under input dependent noise based on maximum likelihood estimation of the hyperparameters. A GP is used to model latent (log-) variances. This function is enhanced to deal with replicated observations.

Usage

mleHetTP(
  X,
  Z,
  lower = NULL,
  upper = NULL,
  noiseControl = list(k_theta_g_bounds = c(1, 100), g_max = 10000, g_bounds = c(1e-06,
    0.1), nu_bounds = c(2 + 0.001, 30), sigma2_bounds = c(sqrt(.Machine$double.eps),
    10000)),
  settings = list(linkThetas = "joint", logN = TRUE, initStrategy = "residuals", checkHom
    = TRUE, penalty = TRUE, trace = 0, return.matrices = TRUE, return.hom = FALSE, factr
    = 1e+09),
  covtype = c("Gaussian", "Matern5_2", "Matern3_2"),
  maxit = 100,
  known = list(beta0 = 0),
  init = list(nu = 3),
  eps = sqrt(.Machine$double.eps)
)

Arguments

`X`	matrix of all designs, one per row, or list with elements: `X0` matrix of unique design locations, one point per row `Z0` vector of averaged observations, of length `nrow(X0)` `mult` number of replicates at designs in `X0`, of length `nrow(X0)`
`Z`	vector of all observations. If using a list with `X`, `Z` has to be ordered with respect to `X0`, and of length `sum(mult)`
`lower`, `upper`	bounds for the `theta` parameter (see `cov_gen` for the exact parameterization). In the multivariate case, it is possible to give vectors for bounds (resp. scalars) for anisotropy (resp. isotropy)
`noiseControl`	list with elements related to optimization of the noise process parameters: `g_min`, `g_max` minimal and maximal noise to signal ratio (of the mean process) `lowerDelta`, `upperDelta` optional vectors (or scalars) of bounds on `Delta`, of length `nrow(X0)` (default to `rep(eps, nrow(X0))` and `rep(noiseControl$g_max, nrow(X0))` resp., or their `log`) `lowerTheta_g`, `upperTheta_g` optional vectors of bounds for the lengthscales of the noise process if `linkThetas == 'none'`. Same as for `theta` if not provided. `k_theta_g_bounds` if `linkThetas == 'joint'`, vector with minimal and maximal values for `k_theta_g` (default to `c(1, 100)`). See Details. `g_bounds` vector for minimal and maximal noise to signal ratios for the noise of the noise process, i.e., the smoothing parameter for the noise process. (default to `c(1e-6, 1)`). `sigma2_bounds`, vector providing minimal and maximal signal variance. `nu_bounds`, vector providing minimal and maximal values for the degrees of freedom.
`settings`	list for options about the general modeling procedure, with elements: `linkThetas` defines the relation between lengthscales of the mean and noise processes. Either `'none'`, `'joint'`(default) or `'constr'`, see Details. `logN`, when `TRUE` (default), the log-noise process is modeled. `initStrategy` one of `'simple'`, `'residuals'` (default) and `'smoothed'` to obtain starting values for `Delta`, see Details `penalty` when `TRUE`, the penalized version of the likelihood is used (i.e., the sum of the log-likelihoods of the mean and variance processes, see References). `checkHom` when `TRUE`, if the log-likelihood with a homoskedastic model is better, then return it. `trace` optional scalar (default to `0`). If positive, tracing information on the fitting process. If `1`, information is given about the result of the heterogeneous model optimization. Level `2` gives more details. Level 3 additionaly displays all details about initialization of hyperparameters. `return.matrices` boolean too include the inverse covariance matrix in the object for further use (e.g., prediction). Arguments `factr` (default to 1e9) and `pgtol` are available to be passed to `control` for L-BFGS-B in `optim`.
`covtype`	covariance kernel type, either `'Gaussian'`, `'Matern5_2'` or `'Matern3_2'`, see `cov_gen`
`maxit`	maximum number of iterations for `L-BFGS-B` of `optim` dedicated to maximum likelihood optimization
`init`, `known`	optional lists of starting values for mle optimization or that should not be optimized over, respectively. Values in `known` are not modified, while it can happen to those of `init`, see Details. One can set one or several of the following: `theta` lengthscale parameter(s) for the mean process either one value (isotropic) or a vector (anistropic) `Delta` vector of nuggets corresponding to each design in `X0`, that are smoothed to give `Lambda` (as the global covariance matrix depend on `Delta` and `nu_hat`, it is recommended to also pass values for `theta`) `beta0` constant trend of the mean process `k_theta_g` constant used for link mean and noise processes lengthscales, when `settings$linkThetas == 'joint'` `theta_g` either one value (isotropic) or a vector (anistropic) for lengthscale parameter(s) of the noise process, when `settings$linkThetas != 'joint'` `g` scalar nugget of the noise process `nu` degree of freedom parameter `sigma2` scale variance `g_H` scalar homoskedastic nugget for the initialisation with a `mleHomGP`. See Details.
`eps`	jitter used in the inversion of the covariance matrix for numerical stability

Details

The global covariance matrix of the model is parameterized as K = sigma2 * C + Lambda * diag(1/mult), with C the correlation matrix between unique designs, depending on the family of kernel used (see cov_gen for available choices). Lambda is the prediction on the noise level given by a (homoskedastic) GP:

\Lambda = C_g(C_g + \mathrm{diag}(g/\mathrm{mult}))^{-1} \Delta

with C_g the correlation matrix between unique designs for this second GP, with lengthscales hyperparameters theta_g and nugget g and Delta the variance level at X0 that are estimated.

It is generally recommended to use find_reps to pre-process the data, to rescale the inputs to the unit cube and to normalize the outputs.

The noise process lengthscales can be set in several ways:

using k_theta_g (settings$linkThetas == 'joint'), supposed to be greater than one by default. In this case lengthscales of the noise process are multiples of those of the mean process.
if settings$linkThetas == 'constr', then the lower bound on theta_g correspond to estimated values of an homoskedastic GP fit.
else lengthscales between the mean and noise process are independent (both either anisotropic or not).

When no starting nor fixed parameter values are provided with init or known, the initialization process consists of fitting first an homoskedastic model of the data, called modHom. Unless provided with init$theta, initial lengthscales are taken at 10% of the range determined with lower and upper, while init$g_H may be use to pass an initial nugget value. The resulting lengthscales provide initial values for theta (or update them if given in init).

If necessary, a second homoskedastic model, modNugs, is fitted to the empirical residual variance between the prediction given by modHom at X0 and Z (up to modHom$nu_hat). Note that when specifying settings$linkThetas == 'joint', then this second homoskedastic model has fixed lengthscale parameters. Starting values for theta_g and g are extracted from modNugs.

Finally, three initialization schemes for Delta are available with settings$initStrategy:

for settings$initStrategy == 'simple', Delta is simply initialized to the estimated g value of modHom. Note that this procedure may fail when settings$penalty == TRUE.
for settings$initStrategy == 'residuals', Delta is initialized to the estimated residual variance from the homoskedastic mean prediction.
for settings$initStrategy == 'smoothed', Delta takes the values predicted by modNugs at X0.

Notice that lower and upper bounds cannot be equal for optim.

Value

a list which is given the S3 class "hetTP", with elements:

theta: unless given, maximum likelihood estimate (mle) of the lengthscale parameter(s),
Delta: unless given, mle of the nugget vector (non-smoothed),
Lambda: predicted input noise variance at X0,
sigma2: plugin estimator of the variance,
theta_g: unless given, mle of the lengthscale(s) of the noise/log-noise process,
k_theta_g: if settings$linkThetas == 'joint', mle for the constant by which lengthscale parameters of theta are multiplied to get theta_g,
g: unless given, mle of the nugget of the noise/log-noise process,
trendtype: either "SK" if beta0 is provided, else "OK",
beta0 constant trend of the mean process, plugin-estimator unless given,
nmean: plugin estimator for the constant noise/log-noise process mean,
ll: log-likelihood value, (ll_non_pen) is the value without the penalty,
nit_opt, msg: counts and message returned by optim
modHom: homoskedastic GP model of class homGP used for initialization of the mean process,
modNugs: homoskedastic GP model of class homGP used for initialization of the noise/log-noise process,
nu_hat_var: variance of the noise process,
used_args: list with arguments provided in the call to the function, which is saved in call,
X0, Z0, Z, eps, logN, covtype: values given in input,
time: time to train the model, in seconds.

References

M. Binois, Robert B. Gramacy, M. Ludkovski (2018), Practical heteroskedastic Gaussian process modeling for large simulation experiments, Journal of Computational and Graphical Statistics, 27(4), 808–821.
Preprint available on arXiv:1611.05902.

A. Shah, A. Wilson, Z. Ghahramani (2014), Student-t processes as alternatives to Gaussian processes, Artificial Intelligence and Statistics, 877–885.

Examples

##------------------------------------------------------------
## Example 1: Heteroskedastic TP modeling on the motorcycle data
##------------------------------------------------------------
set.seed(32)

## motorcycle data
library(MASS)
X <- matrix(mcycle$times, ncol = 1)
Z <- mcycle$accel
nvar <- 1
plot(X, Z, ylim = c(-160, 90), ylab = 'acceleration', xlab = "time")


## Model fitting
model <- mleHetTP(X = X, Z = Z, lower = rep(0.1, nvar), upper = rep(50, nvar),
                  covtype = "Matern5_2")
            
## Display averaged observations
points(model$X0, model$Z0, pch = 20)

## A quick view of the fit                  
summary(model)

## Create a prediction grid and obtain predictions
xgrid <- matrix(seq(0, 60, length.out = 301), ncol = 1) 
preds <- predict(x = xgrid, object =  model)

## Display mean predictive surface
lines(xgrid, preds$mean, col = 'red', lwd = 2)
## Display 95% confidence intervals
lines(xgrid, preds$mean + sqrt(preds$sd2) * qt(0.05, df = model$nu + nrow(X)), col = 2, lty = 2)
lines(xgrid, preds$mean + sqrt(preds$sd2) * qt(0.95, df = model$nu + nrow(X)), col = 2, lty = 2)
## Display 95% prediction intervals
lines(xgrid, preds$mean + sqrt(preds$sd2 + preds$nugs) * qt(0.05, df = model$nu + nrow(X)),
  col = 3, lty = 2)
lines(xgrid, preds$mean + sqrt(preds$sd2 + preds$nugs) * qt(0.95, df = model$nu + nrow(X)), 
  col = 3, lty = 2)

##------------------------------------------------------------
## Example 2: 2D Heteroskedastic TP modeling
##------------------------------------------------------------
set.seed(1)
nvar <- 2
  
## Branin redefined in [0,1]^2
branin <- function(x){
  if(is.null(nrow(x)))
    x <- matrix(x, nrow = 1)
    x1 <- x[,1] * 15 - 5
    x2 <- x[,2] * 15
    (x2 - 5/(4 * pi^2) * (x1^2) + 5/pi * x1 - 6)^2 + 10 * (1 - 1/(8 * pi)) * cos(x1) + 10
}

## Noise field via standard deviation
noiseFun <- function(x){
  if(is.null(nrow(x)))
    x <- matrix(x, nrow = 1)
  return(1/5*(3*(2 + 2*sin(x[,1]*pi)*cos(x[,2]*3*pi) + 5*rowSums(x^2))))
}

## data generating function combining mean and noise fields
ftest <- function(x){
  return(branin(x) + rnorm(nrow(x), mean = 0, sd = noiseFun(x)))
}

## Grid of predictive locations
ngrid <- 51
xgrid <- matrix(seq(0, 1, length.out = ngrid), ncol = 1) 
Xgrid <- as.matrix(expand.grid(xgrid, xgrid))

## Unique (randomly chosen) design locations
n <- 100
Xu <- matrix(runif(n * 2), n)

## Select replication sites randomly
X <- Xu[sample(1:n, 20*n, replace = TRUE),]

## obtain training data response at design locations X
Z <- ftest(X)

## Formating of data for model creation (find replicated observations) 
prdata <- find_reps(X, Z, rescale = FALSE, normalize = FALSE)

## Model fitting
model <- mleHetTP(X = list(X0 = prdata$X0, Z0 = prdata$Z0, mult = prdata$mult), Z = prdata$Z, ,
                  lower = rep(0.01, nvar), upper = rep(10, nvar),
                  covtype = "Matern5_2")

## a quick view into the data stored in the "hetTP"-class object
summary(model)                  
             
## prediction from the fit on the grid     
preds <- predict(x = Xgrid, object =  model)

## Visualization of the predictive surface
par(mfrow = c(2, 2))
contour(x = xgrid,  y = xgrid, z = matrix(branin(Xgrid), ngrid), 
  main = "Branin function", nlevels = 20)
points(X, col = 'blue', pch = 20)
contour(x = xgrid,  y = xgrid, z = matrix(preds$mean, ngrid), 
  main = "Predicted mean", nlevels = 20)
points(X, col = 'blue', pch = 20)
contour(x = xgrid,  y = xgrid, z = matrix(noiseFun(Xgrid), ngrid), 
  main = "Noise standard deviation function", nlevels = 20)
points(X, col = 'blue', pch = 20)
contour(x = xgrid,  y= xgrid, z = matrix(sqrt(preds$nugs), ngrid), 
  main = "Predicted noise values", nlevels = 20)
points(X, col = 'blue', pch = 20)
par(mfrow = c(1, 1))

[Package hetGP version 1.1.6 Index]