| design {dgpsi} | R Documentation |
Sequential design of a (D)GP emulator or a bundle of (D)GP emulators
Description
This function implements the sequential design of a (D)GP emulator or a bundle of (D)GP emulators.
Usage
design(
object,
N,
x_cand,
y_cand,
n_cand,
limits,
int,
f,
reps,
freq,
x_test,
y_test,
reset,
target,
method,
eval,
verb,
autosave,
new_wave,
cores,
...
)
## S3 method for class 'gp'
design(
object,
N,
x_cand = NULL,
y_cand = NULL,
n_cand = 200,
limits = NULL,
int = FALSE,
f = NULL,
reps = 1,
freq = c(1, 1),
x_test = NULL,
y_test = NULL,
reset = FALSE,
target = NULL,
method = vigf,
eval = NULL,
verb = TRUE,
autosave = list(),
new_wave = TRUE,
cores = 1,
...
)
## S3 method for class 'dgp'
design(
object,
N,
x_cand = NULL,
y_cand = NULL,
n_cand = 200,
limits = NULL,
int = FALSE,
f = NULL,
reps = 1,
freq = c(1, 1),
x_test = NULL,
y_test = NULL,
reset = FALSE,
target = NULL,
method = vigf,
eval = NULL,
verb = TRUE,
autosave = list(),
new_wave = TRUE,
cores = 1,
train_N = 100,
refit_cores = 1,
pruning = TRUE,
control = list(),
...
)
## S3 method for class 'bundle'
design(
object,
N,
x_cand = NULL,
y_cand = NULL,
n_cand = 200,
limits = NULL,
int = FALSE,
f = NULL,
reps = 1,
freq = c(1, 1),
x_test = NULL,
y_test = NULL,
reset = FALSE,
target = NULL,
method = vigf,
eval = NULL,
verb = TRUE,
autosave = list(),
new_wave = TRUE,
cores = 1,
train_N = 100,
refit_cores = 1,
...
)
Arguments
object |
can be one of the following:
|
N |
the number of steps for the sequential design. |
x_cand |
a matrix (with each row being a design point and column being an input dimension) that gives a candidate set
in which the next design point is determined. If |
y_cand |
a matrix (with each row being a simulator evaluation and column being an output dimension) that gives the realizations
from the simulator at input positions in |
n_cand |
an integer that gives
Defaults to |
limits |
a two-column matrix that gives the ranges of each input dimension, or a vector of length two if there is only one
input dimension. If a vector is provided, it will be converted to a two-column row matrix. The rows of the matrix correspond to input
dimensions, and its first and second columns correspond to the minimum and maximum values of the input dimensions. Set
|
int |
a bool or a vector of bools that indicates if an input dimension is an integer type. If a bool is given, it will be applied to
all input dimensions. If a vector is provided, it should have a length equal to the input dimensions and will be applied to individual
input dimensions. Defaults to |
f |
an R function that represents the simulator.
See Note section below for further information. This argument is used when |
reps |
an integer that gives the number of repetitions of the located design points to be created and used for evaluations of |
freq |
a vector of two integers with the first element giving the frequency (in number of steps) to re-fit the
emulator, and the second element giving the frequency to implement the emulator validation (for RMSE). Defaults to |
x_test |
a matrix (with each row being an input testing data point and each column being an input dimension) that gives the testing
input data to evaluate the emulator after each step of the sequential design. Set to |
y_test |
the testing output data that correspond to
Set to |
reset |
a bool or a vector of bools indicating whether to reset hyperparameters of the emulator to their initial values when it was initially
constructed after the input-output update and before the re-fit. If a bool is given, it will be applied to
every step of the sequential design. If a vector is provided, its length should be equal to |
target |
a numeric or a vector that gives the target RMSEs at which the sequential design is terminated. Defaults to |
method |
an R function that give indices of designs points in a candidate set. The function must satisfy the following basic rules:
See |
eval |
an R function that calculates the customized evaluating metric of the emulator. The function must satisfy the following basic rules:
If no customized function is provided, the built-in evaluation metric, RMSE, will be calculated. Defaults to |
verb |
a bool indicating if the trace information will be printed during the sequential design.
Defaults to |
autosave |
a list that contains configuration settings for the automatic saving of the emulator:
|
new_wave |
a bool indicating if the current execution of |
cores |
an integer that gives the number of cores to be used for emulator validations. If set to |
... |
any arguments (with names different from those of arguments used in |
train_N |
an integer or a vector of integers that gives the number of training iterations to be used to re-fit the DGP emulator at each step of the sequential design:
Defaults to |
refit_cores |
the number of cores/workers to be used to re-fit GP components (in the same layer of a DGP emulator)
at each M-step during the re-fitting. If set to |
pruning |
a bool indicating if dynamic pruning of DGP structures will be implemented during the sequential design after the total number of
design points exceeds |
control |
a list that can supply any of the following components to control the dynamic pruning of the DGP emulator:
The argument is only used when |
Details
See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.
Value
An updated object is returned with a slot called design that contains:
-
S slots, named
wave1, wave2,..., waveS, that contain information of S waves of sequential designs that have been applied to the emulator. Each slot contains the following elements:-
N, an integer that gives the numbers of steps implemented in the corresponding wave; -
rmse, a matrix that gives the RMSEs of emulators constructed during the corresponding wave, ifeval = NULL; -
metric, a matrix that gives the customized evaluating metric values of emulators constructed during the corresponding wave, if a customized function is supplied toeval; -
freq, an integer that gives the frequency that the emulator validations are implemented during the corresponding wave. -
enrichment, a vector of sizeNthat gives the number of new design points added after each step of the sequential design (ifobjectis an instance of thegpordgpclass), or a matrix that gives the number of new design points added to emulators in a bundle after each step of the sequential design (ifobjectis an instance of thebundleclass).
If
targetis notNULL, the following additional elements are also included:-
target, the target RMSE(s) to stop the sequential design. -
reached, a bool (ifobjectis an instance of thegpordgpclass) or a vector of bools (ifobjectis an instance of thebundleclass) that indicate if the target RMSEs are reached at the end of the sequential design.
-
a slot called
typethat gives the type of validations:either LOO ('loo') or OOS ('oos') if
eval = NULL. Seevalidate()for more information about LOO and OOS.'customized' if a customized R function is provided to
eval.
two slots called
x_testandy_testthat contain the data points for the OOS validation if thetypeslot is 'oos'.If
y_cand = NULLand there areNAs returned from the suppliedfduring the sequential design, a slot calledexclusionis included that records the located design positions that producedNAs viaf. The sequential design will use this information to avoid re-visiting the same locations (ifx_candis supplied) or their neighborhoods (ifx_candisNULL) in later runs ofdesign().
See Note section below for further information.
Note
The validation of an emulator is forced after the final step of a sequential design even
Nis not multiples of the second element infreq.Any
loooroosslot that already exists inobjectwill be cleaned, and a new slot calledlooorooswill be created in the returned object depending on whetherx_testandy_testare provided. The new slot gives the validation information of the emulator constructed in the final step of the sequential design. Seevalidate()for more information about the slotslooandoos.If
objecthas previously been used bydesign()for sequential designs, the information of the current wave of the sequential design will replace those of old waves and be contained in the returned object, unlessthe validation type (LOO or OOS depending on whether
x_testandy_testare supplied or not) of the current wave of the sequential design is the same as the validation types (shown in thetypeof thedesignslot ofobject) in previous waves, and if the validation type is OOS,x_testandy_testin the current wave must also be identical to those in the previous waves;both the current and previous waves of the sequential design supply customized evaluation functions to
eval. Users need to ensure the customized evaluation functions are consistent among different waves. Otherwise, the trace plot of RMSEs produced bydraw()will show values of different evaluation metrics in different waves.
In above two cases, the information of the current wave of the sequential design will be added to the
designslot of the returned object under the namewaveS.If
objectis an instance of thegpclass andeval = NULL, the matrix in thermseslot is single-columned. Ifobjectis an instance of thedgporbundleclass andeval = NULL, the matrix in thermseslot can have multiple columns that correspond to different output dimensions or different emulators in the bundle.If
objectis an instance of thegpclass andeval = NULL,targetneeds to be a single value giving the RMSE threshold. Ifobjectis an instance of thedgporbundleclass andeval = NULL,targetcan be a vector of values that gives the RMSE thresholds for different output dimensions or different emulators. If a single value is provided, it will be used as the RMSE threshold for all output dimensions (ifobjectis an instance of thedgp) or all emulators (ifobjectis an instance of thebundle). If a customized function is supplied toeval, the user needs to ensure that the length oftargetis equal to that of the output fromeval.When defining
f, it is important to ensure that:the column order of the first argument of
fis consistent with the training input used for the emulator;the column order of the output matrix of
fis consistent with the order of emulator output dimensions (ifobjectis an instance of thedgpclass), or the order of emulators placed inobject(ifobjectis an instance of thebundleclass).
The output matrix produced by
fmay includeNAs. This is especially beneficial as it allows the sequential design process to continue without interruption, even if errors orNAoutputs are encountered fromfat certain input locations identified by the sequential designs. Users should ensure to handle any errors withinfby appropriately returningNAs.When defining
eval, the output metric needs to be positive ifdraw()is used withlog = T. And one needs to ensure that a lower metric value indicates a better emulation performance iftargetis set.Any R vector detected in
x_testandy_testwill be treated as a column vector and automatically converted into a single-column R matrix. Thus, ifx_testory_testis a single testing data point with multiple dimensions, it must be given as a matrix.
Examples
## Not run:
# load packages and the Python env
library(lhs)
library(dgpsi)
# construct a 2D non-stationary function that takes a matrix as the input
f <- function(x) {
sin(1/((0.7*x[,1,drop=F]+0.3)*(0.7*x[,2,drop=F]+0.3)))
}
# generate the initial design
X <- maximinLHS(5,2)
Y <- f(X)
# generate the validation data
validate_x <- maximinLHS(30,2)
validate_y <- f(validate_x)
# training a 2-layered DGP emulator with the initial design
m <- dgp(X, Y)
# specify the ranges of the input dimensions
lim_1 <- c(0, 1)
lim_2 <- c(0, 1)
lim <- rbind(lim_1, lim_2)
# 1st wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)
# 2nd wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)
# 3rd wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)
# draw the design created by the sequential design
draw(m,'design')
# inspect the trace of RMSEs during the sequential design
draw(m,'rmse')
# reduce the number of imputations for faster OOS
m_faster <- set_imp(m, 5)
# plot the OOS validation with the faster DGP emulator
plot(m_faster, x_test = validate_x, y_test = validate_y)
## End(Not run)