fit_two_layer {deepgp} | R Documentation |
Conducts MCMC sampling of hyperparameters and hidden layer
w
for a two layer deep GP. Separate length scale
parameters theta_w
and theta_y
govern the correlation
strength of the hidden layer and outer layer respectively. Nugget
parameter g
governs noise on the outer layer. In Matern
covariance, v
governs smoothness.
fit_two_layer(
x,
y,
D = ifelse(is.matrix(x), ncol(x), 1),
nmcmc = 10000,
verb = TRUE,
w_0 = NULL,
g_0 = 0.01,
theta_y_0 = 0.1,
theta_w_0 = 0.1,
true_g = NULL,
settings = NULL,
cov = c("matern", "exp2"),
v = 2.5,
vecchia = FALSE,
m = min(25, length(y) - 1)
)
x |
vector or matrix of input locations |
y |
vector of response values |
D |
integer designating dimension of hidden layer, defaults to
dimension of |
nmcmc |
number of MCMC iterations |
verb |
logical indicating whether to print iteration progress |
w_0 |
initial value for hidden layer |
g_0 |
initial value for |
theta_y_0 |
initial value for |
theta_w_0 |
initial value for |
true_g |
if true nugget is known it may be specified here (set to a small value to make fit deterministic). Note - values that are too small may cause numerical issues in matrix inversions. |
settings |
hyperparameters for proposals and priors (see details) |
cov |
covariance kernel, either Matern or squared exponential
( |
v |
Matern smoothness parameter (only used if |
vecchia |
logical indicating whether to use Vecchia approximation |
m |
size of Vecchia conditioning sets (only used if
|
Maps inputs x
through hidden layer w
to outputs
y
. Conducts sampling of the hidden layer using Elliptical
Slice sampling. Utilizes Metropolis Hastings sampling of the length
scale and nugget parameters with proposals and priors controlled by
settings
. When true_g
is set to a specific value, the
nugget is not estimated. When vecchia = TRUE
, all calculations
leverage the Vecchia approximation with specified conditioning set size
m
. Vecchia approximation is only implemented for
cov = "matern"
.
Proposals for g
, theta_y
, and
theta_w
follow a uniform sliding window scheme, e.g.
g_star <- runif(1, l * g_t / u, u * g_t / l)
,
with defaults l = 1
and u = 2
provided in settings
.
To adjust these, set settings = list(l = new_l, u = new_u)
.
Priors on g
, theta_y
, and theta_w
follow Gamma
distributions with shape parameters (alpha
) and rate parameters
(beta
) controlled within the settings
list object.
Defaults are
settings$alpha$g <- 1.5
settings$beta$g <- 3.9
settings$alpha$theta_w <- 1.5
settings$beta$theta_w <- 3.9 / 4
settings$alpha$theta_y <- 1.5
settings$beta$theta_y <- 3.9 / 6
These priors are designed for x
scaled to
[0, 1] and y
scaled to have mean 0 and variance 1. These may be
adjusted using the settings
input.
When w_0 = NULL
, the hidden layer is initialized at x
(i.e. the identity mapping). The default prior mean of the hidden layer
is zero, but may be adjusted to x
using
settings = list(w_prior_mean = x)
. If w_0
is of dimension
nrow(x) - 1
by D
, the final row is predicted using kriging.
This is helpful in sequential design when adding a new input location
and starting the MCMC at the place where the previous MCMC left off.
The output object of class dgp2
or dgp2vec
is designed for
use with continue
, trim
, and predict
.
a list of the S3 class dgp2
or dgp2vec
with elements:
x
: copy of input matrix
y
: copy of response vector
nmcmc
: number of MCMC iterations
settings
: copy of proposal/prior settings
v
: copy of Matern smoothness parameter (v = 999
indicates cov = "exp2"
)
g
: vector of MCMC samples for g
theta_y
: vector of MCMC samples for theta_y
(length
scale of outer layer)
theta_w
: matrix of MCMC samples for theta_w
(length
scale of inner layer)
tau2
: vector of MLE estimates for tau2
(scale
parameter of outer layer)
w
: list of MCMC samples for hidden layer w
time
: computation time in seconds
Sauer, A, RB Gramacy, and D Higdon. 2020. "Active Learning for Deep Gaussian
Process Surrogates." Technometrics, to appear; arXiv:2012.08015.
Sauer, A, A Cooper, and RB Gramacy. 2022. "Vecchia-approximated Deep Gaussian
Processes for Computer Experiments." pre-print on arXiv:2204.02904
Murray, I, RP Adams, and D MacKay. 2010. "Elliptical slice sampling."
Journal of Machine Learning Research 9, 541-548.
# Examples of real-world implementations are available at:
# https://bitbucket.org/gramacylab/deepgp-ex/
# G function (https://www.sfu.ca/~ssurjano/gfunc.html)
f <- function(xx, a = (c(1:length(xx)) - 1) / 2) {
new1 <- abs(4 * xx - 2) + a
new2 <- 1 + a
prod <- prod(new1 / new2)
return((prod - 1) / 0.86)
}
# Training data
d <- 1
n <- 20
x <- matrix(runif(n * d), ncol = d)
y <- apply(x, 1, f)
# Testing data
n_test <- 100
xx <- matrix(runif(n_test * d), ncol = d)
yy <- apply(xx, 1, f)
plot(xx[order(xx)], yy[order(xx)], type = "l")
points(x, y, col = 2)
# Example 1: full model (nugget estimated, using continue)
fit <- fit_two_layer(x, y, nmcmc = 1000)
plot(fit)
fit <- continue(fit, 1000)
plot(fit)
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit, hidden = TRUE)
# Example 2: Vecchia approximated model
# (Vecchia approximation is faster for larger data sizes)
fit <- fit_two_layer(x, y, nmcmc = 2000, vecchia = TRUE, m = 10)
plot(fit)
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit, hidden = TRUE)
# Example 3: Vecchia approximated model (re-approximated after burn-in)
fit <- fit_two_layer(x, y, nmcmc = 1000, vecchia = TRUE, m = 10)
fit <- continue(fit, 1000, re_approx = TRUE)
plot(fit)
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit, hidden = TRUE)