Fit {GaSP}R Documentation

Fit a GaSP model.

Description

Fit (train) a GaSP model.

Usage

Fit(
  x,
  y,
  reg_model,
  sp_model = NULL,
  cor_family = c("PowerExponential", "Matern"),
  cor_par = data.frame(0),
  random_error = c(FALSE, TRUE),
  sp_var = -1,
  error_var = -1,
  nugget = 1e-09,
  tries = 10,
  seed = 500,
  fit_objective = c("Likelihood", "Posterior"),
  theta_standardized_min = 0,
  theta_standardized_max = .Machine$double.xmax,
  alpha_min = 0,
  alpha_max = 1,
  derivatives_min = 0,
  derivatives_max = 3,
  log_obj_tol = 1e-05,
  log_obj_diff = 0,
  lambda_prior = 0.1,
  model_comparison = c("Objective", "CV")
)

Arguments

x

A data frame containing the input (explanatory variable) training data.

y

A vector or a data frame with one column containing the output (response) training data.

reg_model

The regression model, specified as a formula, but note the left-hand side of the formula is unused; see example.

sp_model

An optional stochastic process model, specified as a formula, but note the left-hand side of the formula and the intercept are unused. The default NULL uses all column names in x.

cor_family

A character string specifying the (product, anisoptropic) correlation-function family: "PowerExponential" for the power-exponential family or "Matern" for the Matern family.

cor_par

An optional data frame containing the correlation parameters with one row per sp_model term and two columns set up as described in GaSPModel Details; only used to start the first objective optimization (see Details).

random_error

A boolean for the presence or not of a random (measurement, white-noise) error term.

sp_var, error_var

Starting values of the stochastic process and error variances for the first try to optimize the objective (see Details); valid (i.e., nonnegative) values will only be used if random_error = TRUE. The invalid default value of -1 indicates that a starting value will be chosen by Fit.

nugget

For numerical stability the proportion of the total variance due to random error is fixed at this value (random_error = FALSE) or bounded below by it (random_error = TRUE).

tries

Number of optimizations of the objective from different random starting points.

seed

The random-number seed to generate starting points.

fit_objective

The objective that Fit attempts to optimize: "Likelihood" (maximum likelihood estimation) or "Posterior" (Bayesian maximum a posteriori estimation).

theta_standardized_min, theta_standardized_max

The minimum and maximum of the standardized \theta parameter (see Details).

alpha_min, alpha_max

The minimum and maximum of the \alpha parameter of power-exponential.

derivatives_min, derivatives_max

The minimum and maximum of the \delta parameter of Matern.

log_obj_tol

An absolute tolerance for terminating the optimization of the log of the objective.

log_obj_diff

The critical value for the change in the log objective for informal tests during optimization of correlation parameters. No testing is done with the default of 0; a larger critical value such as 2 may give a more parsimonious model.

lambda_prior

The rate parameter of an exponential prior for each \theta parameter; used only if fit_objective = "Posterior".

model_comparison

The criterion used to select from multiple solutions when tries > 1: the objective function ("Objective") or leave-one-out cross validation ("CV").

Details

Fit numerically optimizes the profile objective function with respect to the correlation parameters; the mean and overall variance parameters are estimated in closed form given the correlation parameters.

A cor_par data frame supplied by the user is the starting point for the first optimization try. If random_error = TRUE, then sp_var / (sp_var + error_var) is another correlation parameter to be optimized; sp_var and error_var values supplied by the user will initialize this parameter for the first try.

Set random_error = TRUE to estimate the variance of the random (measurement, white-noise) error; a small nugget error variance is for numerical stability.

For term j in the stochastic-process model, the estimate of \theta_j is constrained between theta_standardized_min / r_j^2 and theta_standardized_max / r_j^2, where r_j is the range of term j. Note that Fit returns unscaled estimates relating to the original, unscaled inputs.

Value

A GaSPModel object, which is a list with the following components:

x

The data frame containing the input training data.

y

The training output data, now as a vector.

reg_model

The regression model, now in the form of a data frame.

sp_model

The stochastic process model, now in the form of a data frame.

cor_family

The correlation family.

cor_par

A data frame for the estimated correlation parameters.

random_error

The boolean for the presence or not of a random error term.

sp_var

The estimated stochastic process variance.

error_var

The estimated random error variance.

beta

A data frame holding the estimated regression-model parameters.

objective

The maximum value found for the objective function: the log likelihood (fit_objective = "Likelihood") or the log posterior (fit_objective = "Posterior").

cond_num

The condition number.

CVRMSE

The leave-one-out cross-validation root mean squared error.

References

Sacks, J., Welch, W.J., Mitchell, T.J., and Wynn, H.P. (1989) "Design and Analysis of Computer Experiments", Statistical Science, 4, pp. 409-423, doi:10.1214/ss/1177012413.

Examples

x <- borehole$x
y <- borehole$y
borehole_fit <- Fit(
  reg_model = ~1, x = x, y = y, cor_family = "Matern",
  random_error = FALSE, nugget = 0, fit_objective = "Posterior"
)

[Package GaSP version 1.0.6 Index]