R: Locate the next design point for a (D)GP emulator or a bundle...

alm {dgpsi}

R Documentation

Locate the next design point for a (D)GP emulator or a bundle of (D)GP emulators using ALM

Description

This function searches from a candidate set to locate the next design point(s) to be added to a (D)GP emulator or a bundle of (D)GP emulators using the Active Learning MacKay (ALM), see the reference below.

Usage

alm(object, x_cand, ...)

## S3 method for class 'gp'
alm(object, x_cand, batch_size = 1, workers = 1, ...)

## S3 method for class 'dgp'
alm(
  object,
  x_cand,
  batch_size = 1,
  workers = 1,
  threading = FALSE,
  aggregate = NULL,
  ...
)

## S3 method for class 'bundle'
alm(
  object,
  x_cand,
  batch_size = 1,
  workers = 1,
  threading = FALSE,
  aggregate = NULL,
  ...
)

Arguments

`object`	can be one of the following: the S3 class `gp`. the S3 class `dgp`. the S3 class `bundle`.
`x_cand`	a matrix (with each row being a design point and column being an input dimension) that gives a candidate set from which the next design point(s) are determined. If `object` is an instance of the `bundle` class, `x_cand` could also be a list with the length equal to the number of emulators contained in the `object`. Each slot in `x_cand` is a matrix that gives a candidate set for each emulator included in the bundle. See Note section below for further information.
`...`	any arguments (with names different from those of arguments used in `alm()`) that are used by `aggregate` can be passed here.
`batch_size`	an integer that gives the number of design points to be chosen. Defaults to `1`.
`workers`	the number of workers/cores to be used for the criterion calculation. If set to `NULL`, the number of workers is set to `⁠(max physical cores available - 1)⁠`. Defaults to `1`.
`threading`	a bool indicating whether to use the multi-threading to accelerate the criterion calculation for a DGP emulator. Turning this option on could improve the speed of criterion calculations when the DGP emulator is built with a moderately large number of training data points and the Matérn-2.5 kernel.
`aggregate`	an R function that aggregates scores of the ALM across different output dimensions (if `object` is an instance of the `dgp` class) or across different emulators (if `object` is an instance of the `bundle` class). The function should be specified in the following basic form: the first argument is a matrix representing scores. The rows of the matrix correspond to different design points. The number of columns of the matrix equals to: the emulator output dimension if `object` is an instance of the `dgp` class; or the number of emulators contained in `object` if `object` is an instance of the `bundle` class. the output should be a vector that gives aggregations of scores at different design points. Set to `NULL` to disable the aggregation. Defaults to `NULL`.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

If object is an instance of the gp class, a vector is returned with the length equal to batch_size, giving the positions (i.e., row numbers) of next design points from x_cand.
If object is an instance of the dgp class, a matrix is returned with row number equal to batch_size and column number equal to one (if aggregate is not NULL) or the output dimension (if aggregate is NULL), giving positions (i.e., row numbers) of next design points from x_cand to be added to the DGP emulator across different outputs. If object is a DGP emulator with either Hetero or NegBin likelihood layer, the returned matrix has two columns with the first column giving positions of next design points from x_cand that correspond to the mean parameter of the normal or negative Binomial distribution, and the second column giving positions of next design points from x_cand that correspond to the variance parameter of the normal distribution or the dispersion parameter of the negative Binomial distribution.
If object is an instance of the bundle class, a matrix is returned with row number equal to batch_size and column number equal to the number of emulators in the bundle, giving positions (i.e., row numbers) of next design points from x_cand to be added to individual emulators.

Note

The column order of the first argument of aggregate must be consistent with the order of emulator output dimensions (if object is an instance of the dgp class), or the order of emulators placed in object if object is an instance of the bundle class;
If x_cand is supplied as a list when object is an instance of bundle class and a aggregate function is provided, the matrices in x_cand must have common rows (i.e., the candidate sets of emulators in the bundle have common input locations) so the aggregate function can be applied.
Any R vector detected in x_cand will be treated as a column vector and automatically converted into a single-column R matrix.

References

MacKay, D. J. (1992). Information-based objective functions for active data selection. Neural Computation, 4(4), 590-604.

Examples

## Not run: 

# load packages and the Python env
library(lhs)
library(dgpsi)

# construct a 1D non-stationary function
f <- function(x) {
 sin(30*((2*x-1)/2-0.4)^5)*cos(20*((2*x-1)/2-0.4))
}

# generate the initial design
X <- maximinLHS(10,1)
Y <- f(X)

# training a 2-layered DGP emulator with the global connection off
m <- dgp(X, Y, connect = F)

# generate a candidate set
x_cand <- maximinLHS(200,1)

# locate the next design point using ALM
next_point <- alm(m, x_cand = x_cand)
X_new <- x_cand[next_point,,drop = F]

# obtain the corresponding output at the located design point
Y_new <- f(X_new)

# combine the new input-output pair to the existing data
X <- rbind(X, X_new)
Y <- rbind(Y, Y_new)

# update the DGP emulator with the new input and output data and refit with 500 training iterations
m <- update(m, X, Y, refit = TRUE, N = 500)

# plot the LOO validation
plot(m)

## End(Not run)

[Package dgpsi version 2.4.0 Index]