dgp {dgpsi} | R Documentation |
Deep Gaussian process emulator construction
Description
This function builds and trains a DGP emulator.
Usage
dgp(
X,
Y,
struc = NULL,
depth = 2,
node = ncol(X),
name = "sexp",
lengthscale = 1,
bounds = NULL,
prior = "ga",
share = TRUE,
nugget_est = FALSE,
nugget = ifelse(all(nugget_est), 0.01, 1e-06),
scale_est = TRUE,
scale = 1,
connect = TRUE,
likelihood = NULL,
training = TRUE,
verb = TRUE,
check_rep = TRUE,
rff = FALSE,
M = NULL,
N = 500,
cores = 1,
blocked_gibbs = TRUE,
ess_burn = 10,
burnin = NULL,
B = 10,
internal_input_idx = NULL,
linked_idx = NULL,
id = NULL
)
Arguments
X |
a matrix where each row is an input training data point and each column is an input dimension. |
Y |
a matrix containing observed training output data. The matrix has its rows being output data points and columns being
output dimensions. When |
struc |
a list that specifies a user-defined DGP structure. It should contain L (the number of DGP layers) sub-lists,
each of which represents a layer and contains a number of GP nodes (defined by |
depth |
number of layers (including the likelihood layer) for a DGP structure. |
node |
number of GP nodes in each layer (except for the final layer or the layer feeding the likelihood node) of the DGP. Defaults to
|
name |
a character or a vector of characters that indicates the kernel functions (either Defaults to |
lengthscale |
initial lengthscales for GP nodes in the DGP emulator. It can be a single numeric value or a vector:
Defaults to a numeric value of |
bounds |
the lower and upper bounds of lengthscales in GP nodes. It can be a vector or a matrix:
Defaults to |
prior |
prior to be used for Maximum a Posterior for lengthscales and nuggets of all GP nodes in the DGP hierarchy:
Defaults to |
share |
a bool indicating if all input dimensions of a GP node share a common lengthscale. Defaults to |
nugget_est |
a bool or a bool vector that indicates if the nuggets of GP nodes (if any) in the final layer are to be estimated. If a single bool is
provided, it will be applied to all GP nodes (if any) in the final layer. If a bool vector (which must have a length of
Defaults to |
nugget |
the initial nugget value(s) of GP nodes (if any) in each layer:
Set |
scale_est |
a bool or a bool vector that indicates if variance of GP nodes (if any) in the final layer are to be estimated. If a single bool is
provided, it will be applied to all GP nodes (if any) in the final layer. If a bool vector (which must have a length of
Defaults to |
scale |
the initial variance value(s) of GP nodes (if any) in the final layer. If it is a single numeric value, it will be applied to all GP nodes (if any)
in the final layer. If it is a vector (which must have a length of |
connect |
a bool indicating whether to implement global input connection to the DGP structure. Setting it to |
likelihood |
the likelihood type of a DGP emulator:
When |
training |
a bool indicating if the initialized DGP emulator will be trained.
When set to |
verb |
a bool indicating if the trace information on DGP emulator construction and training will be printed during the function execution.
Defaults to |
check_rep |
a bool indicating whether to check the repetitions in the dataset, i.e., if one input
position has multiple outputs. Defaults to |
rff |
a bool indicating whether to use random Fourier features to approximate the correlation matrices in training. Turning on this option could help accelerate
the training when the training data is relatively large but may reduce the quality of the resulting emulator. Defaults to |
M |
the number of features to be used by random Fourier approximation. It is only used
when |
N |
number of iterations for the training. Defaults to |
cores |
the number of cores/workers to be used to optimize GP components (in the same layer) at each M-step of the training. If set to |
blocked_gibbs |
a bool indicating if the latent variables are imputed layer-wise using ESS-within-Blocked-Gibbs. ESS-within-Blocked-Gibbs would be faster and
more efficient than ESS-within-Gibbs that imputes latent variables node-wise because it reduces the number of components to be sampled during the Gibbs,
especially when there is a large number of GP nodes in layers due to higher input dimensions. Default to |
ess_burn |
number of burnin steps for the ESS-within-Gibbs
at each I-step of the training. Defaults to |
burnin |
the number of training iterations to be discarded for
point estimates of model parameters. Must be smaller than the training iterations |
B |
the number of imputations to produce the later predictions. Increase the value to account for
more imputation uncertainties with slower predictions. Decrease the value for lower imputation uncertainties but faster predictions.
Defaults to |
internal_input_idx |
column indices of |
linked_idx |
either a vector or a list of vectors:
Set |
id |
an ID to be assigned to the DGP emulator. If an ID is not provided (i.e., |
Details
See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/ and learn how to customize a DGP structure.
Value
An S3 class named dgp
that contains five slots:
-
id
: A number or character string assigned through theid
argument. -
data
: a list that contains two elements:X
andY
which are the training input and output data respectively. -
specs
: a list that contains-
L (i.e., the number of layers in the DGP hierarchy) sub-lists named
layer1, layer2,..., layerL
. Each sub-list contains D (i.e., the number of GP/likelihood nodes in the corresponding layer) sub-lists namednode1, node2,..., nodeD
. If a sub-list corresponds to a likelihood node, it contains one element calledtype
that gives the name (Hetero
,Poisson
, orNegBin
) of the likelihood node. If a sub-list corresponds to a GP node, it contains four elements:-
kernel
: the type of the kernel function used for the GP node. -
lengthscales
: a vector of lengthscales in the kernel function. -
scale
: the variance value in the kernel function. -
nugget
: the nugget value in the kernel function.
-
-
internal_dims
: the column indices ofX
that correspond to the linked emulators in the preceding layers of a linked system. -
external_dims
: the column indices ofX
that correspond to global inputs to the linked system of emulators. It is shown asFALSE
ifinternal_input_idx = NULL
. -
linked_idx
: the value passed to argumentlinked_idx
. It is shown asFALSE
if the argumentlinked_idx
isNULL
. -
seed
: the random seed generated to produce the imputations. This information is stored for the reproducibility when the DGP emulator (that was saved bywrite()
with the light optionlight = TRUE
) is loaded back to R byread()
. -
B
: the number of imputations used to generate the emulator.
internal_dims
andexternal_dims
are generated only whenstruc = NULL
. -
-
constructor_obj
: a 'python' object that stores the information of the constructed DGP emulator. -
container_obj
: a 'python' object that stores the information for the linked emulation. -
emulator_obj
: a 'python' object that stores the information for the predictions from the DGP emulator.
The returned dgp
object can be used by
-
predict()
for DGP predictions. -
continue()
for additional DGP training iterations. -
validate()
for LOO and OOS validations. -
plot()
for validation plots. -
lgp()
for linked (D)GP emulator constructions. -
window()
for model parameter trimming. -
summary()
to summarize the trained DGP emulator. -
write()
to save the DGP emulator to a.pkl
file. -
set_imp()
to change the number of imputations. -
set_linked_idx()
to add the linking information to the DGP emulator for linked emulations. -
design()
for sequential designs. -
update()
to update the DGP emulator with new inputs and outputs. -
alm()
,mice()
,pei()
, andvigf()
to locate next design points.
Note
Any R vector detected in X
and Y
will be treated as a column vector and automatically converted into a single-column
R matrix. Thus, if X
is a single data point with multiple dimensions, it must be given as a matrix.
Examples
## Not run:
# load the package and the Python env
library(dgpsi)
# construct a step function
f <- function(x) {
if (x < 0.5) return(-1)
if (x >= 0.5) return(1)
}
# generate training data
X <- seq(0, 1, length = 10)
Y <- sapply(X, f)
# set a random seed
set_seed(999)
# training a DGP emulator
m <- dgp(X, Y)
# continue for further training iterations
m <- continue(m)
# summarizing
summary(m)
# trace plot
trace_plot(m)
# trim the traces of model parameters
m <- window(m, 800)
# LOO cross validation
m <- validate(m)
plot(m)
# prediction
test_x <- seq(0, 1, length = 200)
m <- predict(m, x = test_x)
# OOS validation
validate_x <- sample(test_x, 10)
validate_y <- sapply(validate_x, f)
plot(m, validate_x, validate_y)
# write and read the constructed emulator
write(m, 'step_dgp')
m <- read('step_dgp')
## End(Not run)