R: Function to fit an emulator to ensemble model output

emulator {stilt}

R Documentation

Function to fit an emulator to ensemble model output

Description

Fits a Gaussian Process emulator to regularly spaced time-series (or 1D spatial) model output. This emulator can later be used to interpolate the model output across model parameter settings. The emulator is fit by maximizing the emulator likelihood using a PORT local optimization routine. The emulator has a flexible structure which can be chosen by the user (see Details).

Usage

emulator(mpars, moutput, par.reg, time.reg, kappa0, zeta0, myrel.tol =
NULL, twice = FALSE, fix.betas = TRUE)

Arguments

`mpars`	Model ensemble parameter settings. A list with components: $par Matrix of ensemble parameter settings. [row, column] = [parameter index, run index]. Columns should correspond to columns of the model output matrix `moutput$out`. $parnames Parameter names for rows of $par (character vector) $parunits Units for parameters in rows of $par (character vector)
`moutput`	Model output. A list with components: $t Time vector for rows of $out (vector) $tunits Time units (character) $out Model output matrix [row, col] = [time, run index]. The run indices should match with the `mpars$par` argument. $outname name (character) $outunits units (character
`par.reg`	Logical vector specifying which parameters to include into the regression matrix. Elements refer to rows of `mpars$par`. Note that if nothing is included into the regression matrix, the mean is still optimized.
`time.reg`	Logical specifying whether to include time into the regression matrix.
`kappa0`	Initial guess for the parameter covariance scaling factor. Larger values lead to higher parameter covariance.
`zeta0`	Initial guess for the nugget.
`myrel.tol`	Relative tolerance for optimization. If you want to use the defaults, assign it to NULL. The optimization stops if it thinks the true likelihood is within this fraction of current likelihood. The higher the number the sooner the optimization stops, but the emulator is 'worse'.
`twice`	Logical. If `TRUE`, the optimization is done twice with different initial parameter values, and then the best of the two is chosen as final emulator. Default is `FALSE`.
`fix.betas`	Logical. If `TRUE`, beta regression coefficients are fixed, and if `FALSE`, they are estimated. Default is `TRUE`.

Details

The emulator parameters are optimized using a local optimization method, to within the specified relative tolerance. The optimization maximizes emulator likelihood. Optionally, if twice=TRUE, the optimization is done twice with different initial parameter settings, and the emulator with the highest likelihood is selected. Depending on the value of fix.betas, the regression beta parameters can either be fixed at their multiple linear regression estimates, or estimated together with other emulator parameters. Mean structure is flexible, and user can choose which parameter and time dimensions to use as regressors via the time.reg and par.reg parameters

Value

Emulator returns an object of class "emul" and is a list with components (see also Reference in the References Section).

`$Theta.mat`	Theta matrix. [row, col] = [run number, parameter number]
`$t.vec`	Time vector
`$Y.mat`	Data matrix Y
`$X.mat`	Regression matrix
`$beta.vec`	A vector of regression parameters
`$kappa`	Parameter covariance scaling factor
`$phi.vec`	A vector of range parameters phi
`$zeta`	Nugget
`$n`	Length of time dimension
`$rho`	Time AR(1) parameter
`$p`	Number of model runs in the ensemble
`$vecC`	Data matrix Y minus the mean vector
`$par.reg`	Logical vector specifying which parameters to include in the regression matrix
`$time.reg`	Logical, specifies whether to include time into the regression matrix
`$Sigma.mats`	List of two precomputed matrices that may be useful: `$Sigma.t.mat` and `$Sigma.theta.mat` (see References)
`$Sigma.theta.inv.mat`	Inverse of `$Sigma.mats$Sigma.theta.mat`

Note

Evaluation of this function (especially for large datasets and many parameters) might take a long time.

Limitations and Caveats

The emulator will not work well for 'jagged' model response surfaces (high nugget)
The emulator is restricted to output at regular intervals in time and space
The code has not been tested under conditions of extreme high / extreme low input parameter range, output time(space) coordinates range, and output range (an example would be model output ranging from -1E20 to 1E20, etc.). In such cases it is recommended to re-scale the time(space) coordinates vector, the input parameters, and/or the model output.
The emulator will not work on scalar model output – it requires multivariate data
The emulator assumes a separable covariance function, and stationarity of the covariance part of the Gaussian process.
The optimization of the emulator parameters degrades dramatically (and increases in time) as a function of number of free parameters. Hence, the emulator might be of limited use for large parameter ensembles
The emulator authors are not responsible for any code errors and/or bugs

Note

If the final emulator performs poorly, try adjusting the settings, especially the initial kappa0 and zeta0 values.

References

R. Olson and W. Chang (2013): Mathematical framework for a separable Gaussian Process Emulator. Tech. Rep., available from
www.scrimhub.org/resources/stilt/Olson_and_Chang_2013_Stilt_Emulator_Technical_Report.pdf.

Examples

# Fit an emulator to the example 1D parameter ensemble data
# Do not use any covariates, use standard settings
data(Data.1D.model)
data(Data.1D.par)
emul.1D <- emulator(Data.1D.par, Data.1D.model, FALSE, FALSE, 1, 1)

# Take a look at the range and regression parameters
cat("Range parameters are:", emul.1D$phi.vec, "\n")
cat("Regression parameters are:", emul.1D$beta.vec, "\n")

# Predict using the emulator at Theta*=3
pred.1D <- predict(emul.1D, 3)

# Plot the prediction
plot(emul.1D$t.vec, pred.1D$mean, xlab="Year", ylab="Sample Output")


## Not run: 
    # Fit an emulator to the UVic ESCM 3-parameter ensemble temperature
    # output Use time and aerosol scaling covariates (parameter 3), run
    # the optimization twice with relative tolerance of 0.1, keep
    # regression parameters at their multiple linear regression
    # estimates data(Data.UVic.model) data(Data.UVic.par) UVic.emul <-
    # emulator(mpars=Data.UVic.par, moutput=Data.UVic.model,
    # par.reg=c(FALSE, FALSE, TRUE), time.reg=TRUE, kappa0=1, zeta0=1,
    # myrel.tol=0.1, twice=TRUE, fix.betas=TRUE)
   
## End(Not run)

[Package stilt version 1.3.0 Index]

Function to fit an emulator to ensemble model output

Description

Usage

Arguments

Details

Value

Note

Limitations and Caveats

Note

References

See Also

Examples