R: Local Approximate SVD-Based GP Models

lasvdGP {DynamicGP}

R Documentation

Local Approximate SVD-Based GP Models

Description

Fits a local approximate SVD-based GP model on a test set X0, training/design set design and response matrix resp. The local neighborhood sets consist of nn out of which n0 points are selected by the Euclidean distance with respect to the test points. The remaining nn-n0 neighborhood points are selected sequentially by a greedy algorithm proposed by Zhang et al. (2018). This function supports the parallelization via both the R packages "parallel" and the OpenMP library.

Usage

lasvdGP(design, resp, X0=design, n0=10, nn=20,
        nfea = min(1000,nrow(design)),
        nsvd = nn, nadd = 1, frac = .95, gstart = 0.0001,
        resvdThres = min(5, nn-n0), every = min(5,nn-n0),
        nstarts = 5,centralize=FALSE, maxit=100,
        errlog = "", nthread = 1, clutype="PSOCK")

Arguments

`design`	An `N` by `d` matrix of `N` training/design inputs.
`resp`	An `L` by `N` response matrix of `design`, where `L` is the length of the time series outputs, `N` is the number of design points.
`X0`	An `M` by `d` matrix of `M` test inputs. The localized SVD-based GP models will be fitted on every point (row) of `X0`. The default value of `X0` is `design`.
`n0`	The number of points in the initial neighborhood set. The initial neighborhood set is selected by the Euclidean distance. The default value is 10.
`nn`	The total number of neighborhood points. The `nn-n0` points are selected sequentially by the proposed algorithm. The default value is 20.
`nfea`	The number of feasible points within which to select the neighborhood points. This function will only consider the `nfea` design points closest to the test point in terms of Euclidean distance when selecting neighborhood points. The default value is the minimum of `N` and 1000.
`nsvd`	The number of design points closest to the test points on whose response matrix to perform the initial singular value decomposition. The default value is `nn`.
`nadd`	The number of neighborhood points selected at one iteration. The default value is 1.
`frac`	The threshold in the cumulative percentage criterion to select the number of SVD bases. The default value is 0.95.
`gstart`	The starting number and upper bound for estimating the nugget parameter. If `gstart = sqrt(.Machine$double.eps)`, the nugget parameter will be fixed at `sqrt(.Machine$double.eps)`, since `sqrt(.Machine$double.eps)` is the lower bound of the nugget term. The default value is 0.0001.
`resvdThres`	The threshold to re-perform SVD. After every `resvdThres` points have been included into the neighborhood set, the SVD of the response matrix will be re-performed and the SVD-based GP model will be refitted. The default value is the minimum of `nn`-`n0` and 5.
`every`	The threshold to refit GP models without re-perform SVD. After every `every` points have been included into the neighborhood set, the GP models will be refitted. But the SVD will not be re-performed. It is suggested `every` <= `resvdThres`. The default value is the minimum of `nn`-`n0` and 5.
`nstarts`	The number of starting points used in the numerical maximization of the posterior density function. The larger `nstarts` will typically lead to more accurate prediction but longer computational time. The default value is 5.
`centralize`	If `centralize=TRUE` the response matrix will be centralized (subtract the mean) before the start of the algorithm. The mean will be added to the predictive mean at the finish of the algorithm. The default value is `FALSE`.
`maxit`	Maximum number of iterations in the numerical optimization algorithm for maximizing the posterior density function. The default value is 100.
`errlog`	The path of a log file that records the errors occur in the process of fitting local SVD-based GP models. If an empty string is provided, no log file will be produced.
`nthread`	The number of threads (processes) used in parallel execution of this function. `nthread=1` implies no parallelization. The default value is 1.
`clutype`	The type of parallization utilized by this function. If `clutype="OMP"`, it will use the OpenMP parallelization. Otherwise, it indicates the type of cluster in the R package "parallel" . The default value is "PSOCK". Required only if `nthread`>1.

Value

`pmean`	An `L` by `M` matrix of posterior predicted mean for the response at the test set `X0`.
`ps2`	An `L` by `M` matrix of posterior predicted variance for the response at the test set `X0`.
`flags`	A vector of integers of length `M` which indicates the status for fitting the local SVD-based GP models for each of the `M` input points in the test set. The value `0` indicates successful fitting, the value `1` indicates an error in Cholesky decomposition of the correlation matrices, the value `2` indicates an error in SVD of the local response matrix, the value `3` indicates an error in optimizing the nugget term.

Author(s)

Ru Zhang heavenmarshal@gmail.com,

C. Devon Lin devon.lin@queensu.ca,

Pritam Ranjan pritamr@iimidr.ac.in

References

Zhang, R., Lin, C. D. and Ranjan, P. (2018) Local Gaussian Process Model for Large-scale Dynamic Computer Experiments, Journal of Computational and Graphical Statistics,
DOI: 10.1080/10618600.2018.1473778.

Examples

library("lhs")
forretal <- function(x,t,shift=1)
{
    par1 <- x[1]*6+4
    par2 <- x[2]*16+4
    par3 <- x[3]*6+1
    t <- t+shift
    y <- (par1*t-2)^2*sin(par2*t-par3)
}
timepoints <- seq(0,1,len=200)
design <- lhs::randomLHS(100,3)
test <- lhs::randomLHS(20,3)

## evaluate the response matrix on the design matrix
resp <- apply(design,1,forretal,timepoints)

n0 <- 14
nn <- 15
gs <- sqrt(.Machine$double.eps)

## lasvdGP with mutiple (5) start points for GP model estimation,
## It use the R package "parallel" for parallelization
retlamsp <- lasvdGP(design,resp,test,n0,nn,frac=.95,gstart=gs,
                    centralize=TRUE,nstarts=5,nthread=2,clutype="PSOCK")

## lasvdGP with single start point for GP model estimation,
## It does not use parallel computation
retlass <- lasvdGP(design,resp,test,n0,nn,frac=.95,gstart=gs,
                   centralize=TRUE,nstarts=1,nthread=1)

[Package DynamicGP version 1.1-9 Index]