mleHetTP {hetGP} | R Documentation |
Student-t process modeling with heteroskedastic noise
Description
Student-t process regression under input dependent noise based on maximum likelihood estimation of the hyperparameters. A GP is used to model latent (log-) variances. This function is enhanced to deal with replicated observations.
Usage
mleHetTP(
X,
Z,
lower = NULL,
upper = NULL,
noiseControl = list(k_theta_g_bounds = c(1, 100), g_max = 10000, g_bounds = c(1e-06,
0.1), nu_bounds = c(2 + 0.001, 30), sigma2_bounds = c(sqrt(.Machine$double.eps),
10000)),
settings = list(linkThetas = "joint", logN = TRUE, initStrategy = "residuals", checkHom
= TRUE, penalty = TRUE, trace = 0, return.matrices = TRUE, return.hom = FALSE, factr
= 1e+09),
covtype = c("Gaussian", "Matern5_2", "Matern3_2"),
maxit = 100,
known = list(beta0 = 0),
init = list(nu = 3),
eps = sqrt(.Machine$double.eps)
)
Arguments
X |
matrix of all designs, one per row, or list with elements:
|
Z |
vector of all observations. If using a list with |
lower , upper |
bounds for the |
noiseControl |
list with elements related to optimization of the noise process parameters:
|
settings |
list for options about the general modeling procedure, with elements:
|
covtype |
covariance kernel type, either |
maxit |
maximum number of iterations for |
init , known |
optional lists of starting values for mle optimization or that should not be optimized over, respectively.
Values in
|
eps |
jitter used in the inversion of the covariance matrix for numerical stability |
Details
The global covariance matrix of the model is parameterized as K = sigma2 * C + Lambda * diag(1/mult)
,
with C
the correlation matrix between unique designs, depending on the family of kernel used (see cov_gen
for available choices).
Lambda
is the prediction on the noise level given by a (homoskedastic) GP:
\Lambda = C_g(C_g + \mathrm{diag}(g/\mathrm{mult}))^{-1} \Delta
with C_g
the correlation matrix between unique designs for this second GP, with lengthscales hyperparameters theta_g
and nugget g
and Delta
the variance level at X0
that are estimated.
It is generally recommended to use find_reps
to pre-process the data, to rescale the inputs to the unit cube and to normalize the outputs.
The noise process lengthscales can be set in several ways:
using
k_theta_g
(settings$linkThetas == 'joint'
), supposed to be greater than one by default. In this case lengthscales of the noise process are multiples of those of the mean process.if
settings$linkThetas == 'constr'
, then the lower bound ontheta_g
correspond to estimated values of an homoskedastic GP fit.else lengthscales between the mean and noise process are independent (both either anisotropic or not).
When no starting nor fixed parameter values are provided with init
or known
,
the initialization process consists of fitting first an homoskedastic model of the data, called modHom
.
Unless provided with init$theta
, initial lengthscales are taken at 10% of the range determined with lower
and upper
,
while init$g_H
may be use to pass an initial nugget value.
The resulting lengthscales provide initial values for theta
(or update them if given in init
).
If necessary, a second homoskedastic model, modNugs
, is fitted to the empirical residual variance between the prediction
given by modHom
at X0
and Z
(up to modHom$nu_hat
).
Note that when specifying settings$linkThetas == 'joint'
, then this second homoskedastic model has fixed lengthscale parameters.
Starting values for theta_g
and g
are extracted from modNugs
.
Finally, three initialization schemes for Delta
are available with settings$initStrategy
:
for
settings$initStrategy == 'simple'
,Delta
is simply initialized to the estimatedg
value ofmodHom
. Note that this procedure may fail whensettings$penalty == TRUE
.for
settings$initStrategy == 'residuals'
,Delta
is initialized to the estimated residual variance from the homoskedastic mean prediction.for
settings$initStrategy == 'smoothed'
,Delta
takes the values predicted bymodNugs
atX0
.
Notice that lower
and upper
bounds cannot be equal for optim
.
Value
a list which is given the S3 class "hetTP"
, with elements:
-
theta
: unless given, maximum likelihood estimate (mle) of the lengthscale parameter(s), -
Delta
: unless given, mle of the nugget vector (non-smoothed), -
Lambda
: predicted input noise variance atX0
, -
sigma2
: plugin estimator of the variance, -
theta_g
: unless given, mle of the lengthscale(s) of the noise/log-noise process, -
k_theta_g
: ifsettings$linkThetas == 'joint'
, mle for the constant by which lengthscale parameters oftheta
are multiplied to gettheta_g
, -
g
: unless given, mle of the nugget of the noise/log-noise process, -
trendtype
: either "SK
" ifbeta0
is provided, else "OK
", -
beta0
constant trend of the mean process, plugin-estimator unless given, -
nmean
: plugin estimator for the constant noise/log-noise process mean, -
ll
: log-likelihood value, (ll_non_pen
) is the value without the penalty, -
nit_opt
,msg
:counts
andmessage
returned byoptim
-
modHom
: homoskedastic GP model of classhomGP
used for initialization of the mean process, -
modNugs
: homoskedastic GP model of classhomGP
used for initialization of the noise/log-noise process, -
nu_hat_var
: variance of the noise process, -
used_args
: list with arguments provided in the call to the function, which is saved incall
, -
X0
,Z0
,Z
,eps
,logN
,covtype
: values given in input, -
time
: time to train the model, in seconds.
References
M. Binois, Robert B. Gramacy, M. Ludkovski (2018), Practical heteroskedastic Gaussian process modeling for large simulation experiments,
Journal of Computational and Graphical Statistics, 27(4), 808–821.
Preprint available on arXiv:1611.05902.
A. Shah, A. Wilson, Z. Ghahramani (2014), Student-t processes as alternatives to Gaussian processes, Artificial Intelligence and Statistics, 877–885.
See Also
predict.hetTP
for predictions.
summary
and plot
functions are available as well.
Examples
##------------------------------------------------------------
## Example 1: Heteroskedastic TP modeling on the motorcycle data
##------------------------------------------------------------
set.seed(32)
## motorcycle data
library(MASS)
X <- matrix(mcycle$times, ncol = 1)
Z <- mcycle$accel
nvar <- 1
plot(X, Z, ylim = c(-160, 90), ylab = 'acceleration', xlab = "time")
## Model fitting
model <- mleHetTP(X = X, Z = Z, lower = rep(0.1, nvar), upper = rep(50, nvar),
covtype = "Matern5_2")
## Display averaged observations
points(model$X0, model$Z0, pch = 20)
## A quick view of the fit
summary(model)
## Create a prediction grid and obtain predictions
xgrid <- matrix(seq(0, 60, length.out = 301), ncol = 1)
preds <- predict(x = xgrid, object = model)
## Display mean predictive surface
lines(xgrid, preds$mean, col = 'red', lwd = 2)
## Display 95% confidence intervals
lines(xgrid, preds$mean + sqrt(preds$sd2) * qt(0.05, df = model$nu + nrow(X)), col = 2, lty = 2)
lines(xgrid, preds$mean + sqrt(preds$sd2) * qt(0.95, df = model$nu + nrow(X)), col = 2, lty = 2)
## Display 95% prediction intervals
lines(xgrid, preds$mean + sqrt(preds$sd2 + preds$nugs) * qt(0.05, df = model$nu + nrow(X)),
col = 3, lty = 2)
lines(xgrid, preds$mean + sqrt(preds$sd2 + preds$nugs) * qt(0.95, df = model$nu + nrow(X)),
col = 3, lty = 2)
##------------------------------------------------------------
## Example 2: 2D Heteroskedastic TP modeling
##------------------------------------------------------------
set.seed(1)
nvar <- 2
## Branin redefined in [0,1]^2
branin <- function(x){
if(is.null(nrow(x)))
x <- matrix(x, nrow = 1)
x1 <- x[,1] * 15 - 5
x2 <- x[,2] * 15
(x2 - 5/(4 * pi^2) * (x1^2) + 5/pi * x1 - 6)^2 + 10 * (1 - 1/(8 * pi)) * cos(x1) + 10
}
## Noise field via standard deviation
noiseFun <- function(x){
if(is.null(nrow(x)))
x <- matrix(x, nrow = 1)
return(1/5*(3*(2 + 2*sin(x[,1]*pi)*cos(x[,2]*3*pi) + 5*rowSums(x^2))))
}
## data generating function combining mean and noise fields
ftest <- function(x){
return(branin(x) + rnorm(nrow(x), mean = 0, sd = noiseFun(x)))
}
## Grid of predictive locations
ngrid <- 51
xgrid <- matrix(seq(0, 1, length.out = ngrid), ncol = 1)
Xgrid <- as.matrix(expand.grid(xgrid, xgrid))
## Unique (randomly chosen) design locations
n <- 100
Xu <- matrix(runif(n * 2), n)
## Select replication sites randomly
X <- Xu[sample(1:n, 20*n, replace = TRUE),]
## obtain training data response at design locations X
Z <- ftest(X)
## Formating of data for model creation (find replicated observations)
prdata <- find_reps(X, Z, rescale = FALSE, normalize = FALSE)
## Model fitting
model <- mleHetTP(X = list(X0 = prdata$X0, Z0 = prdata$Z0, mult = prdata$mult), Z = prdata$Z, ,
lower = rep(0.01, nvar), upper = rep(10, nvar),
covtype = "Matern5_2")
## a quick view into the data stored in the "hetTP"-class object
summary(model)
## prediction from the fit on the grid
preds <- predict(x = Xgrid, object = model)
## Visualization of the predictive surface
par(mfrow = c(2, 2))
contour(x = xgrid, y = xgrid, z = matrix(branin(Xgrid), ngrid),
main = "Branin function", nlevels = 20)
points(X, col = 'blue', pch = 20)
contour(x = xgrid, y = xgrid, z = matrix(preds$mean, ngrid),
main = "Predicted mean", nlevels = 20)
points(X, col = 'blue', pch = 20)
contour(x = xgrid, y = xgrid, z = matrix(noiseFun(Xgrid), ngrid),
main = "Noise standard deviation function", nlevels = 20)
points(X, col = 'blue', pch = 20)
contour(x = xgrid, y= xgrid, z = matrix(sqrt(preds$nugs), ngrid),
main = "Predicted noise values", nlevels = 20)
points(X, col = 'blue', pch = 20)
par(mfrow = c(1, 1))