Fit {GaSP} | R Documentation |
Fit a GaSP model.
Description
Fit (train) a GaSP model.
Usage
Fit(
x,
y,
reg_model,
sp_model = NULL,
cor_family = c("PowerExponential", "Matern"),
cor_par = data.frame(0),
random_error = c(FALSE, TRUE),
sp_var = -1,
error_var = -1,
nugget = 1e-09,
tries = 10,
seed = 500,
fit_objective = c("Likelihood", "Posterior"),
theta_standardized_min = 0,
theta_standardized_max = .Machine$double.xmax,
alpha_min = 0,
alpha_max = 1,
derivatives_min = 0,
derivatives_max = 3,
log_obj_tol = 1e-05,
log_obj_diff = 0,
lambda_prior = 0.1,
model_comparison = c("Objective", "CV")
)
Arguments
x |
A data frame containing the input (explanatory variable) training data. |
y |
A vector or a data frame with one column containing the output (response) training data. |
reg_model |
The regression model, specified as a formula, but note the left-hand side of the formula is unused; see example. |
sp_model |
An optional stochastic process model, specified as a formula,
but note the left-hand side of the formula and the intercept are unused.
The default |
cor_family |
A character string specifying the (product, anisoptropic) correlation-function family: "PowerExponential" for the power-exponential family or "Matern" for the Matern family. |
cor_par |
An optional data frame containing the correlation parameters
with one row per |
random_error |
A boolean for the presence or not of a random (measurement, white-noise) error term. |
sp_var , error_var |
Starting values of the stochastic process and error variances
for the first try to optimize the objective (see Details);
valid (i.e., nonnegative) values will only be used if |
nugget |
For numerical stability the proportion of the total variance
due to random error is fixed at this value ( |
tries |
Number of optimizations of the objective from different random starting points. |
seed |
The random-number seed to generate starting points. |
fit_objective |
The objective that |
theta_standardized_min , theta_standardized_max |
The minimum and maximum of the standardized |
alpha_min , alpha_max |
The minimum and maximum
of the |
derivatives_min , derivatives_max |
The minimum and maximum
of the |
log_obj_tol |
An absolute tolerance for terminating the optimization of the log of the objective. |
log_obj_diff |
The critical value for the change in the log objective for informal tests during optimization of correlation parameters. No testing is done with the default of 0; a larger critical value such as 2 may give a more parsimonious model. |
lambda_prior |
The rate parameter of an exponential prior
for each |
model_comparison |
The criterion used to select from multiple solutions
when |
Details
Fit numerically optimizes the profile objective function with respect to the correlation parameters; the mean and overall variance parameters are estimated in closed form given the correlation parameters.
A cor_par
data frame supplied by the user is the starting point
for the first optimization try.
If random_error = TRUE
,
then sp_var
/ (sp_var
+ error_var
) is another
correlation parameter to be optimized;
sp_var
and error_var
values supplied by the user
will initialize this parameter for the first try.
Set random_error = TRUE
to estimate the variance of the
random (measurement, white-noise) error;
a small nugget
error variance is for numerical stability.
For term j
in the stochastic-process model,
the estimate of \theta_j
is constrained between
theta_standardized_min
/ r_j^2
and
theta_standardized_max
/ r_j^2
,
where r_j
is the range of term j
.
Note that Fit
returns unscaled estimates relating to the original, unscaled inputs.
Value
A GaSPModel
object, which is a list with the following components:
x |
The data frame containing the input training data. |
y |
The training output data, now as a vector. |
reg_model |
The regression model, now in the form of a data frame. |
sp_model |
The stochastic process model, now in the form of a data frame. |
cor_family |
The correlation family. |
cor_par |
A data frame for the estimated correlation parameters. |
random_error |
The boolean for the presence or not of a random error term. |
sp_var |
The estimated stochastic process variance. |
error_var |
The estimated random error variance. |
beta |
A data frame holding the estimated regression-model parameters. |
objective |
The maximum value found for the objective function: the log likelihood (fit_objective = "Likelihood") or the log posterior (fit_objective = "Posterior"). |
cond_num |
The condition number. |
CVRMSE |
The leave-one-out cross-validation root mean squared error. |
References
Sacks, J., Welch, W.J., Mitchell, T.J., and Wynn, H.P. (1989) "Design and Analysis of Computer Experiments", Statistical Science, 4, pp. 409-423, doi:10.1214/ss/1177012413.
Examples
x <- borehole$x
y <- borehole$y
borehole_fit <- Fit(
reg_model = ~1, x = x, y = y, cor_family = "Matern",
random_error = FALSE, nugget = 0, fit_objective = "Posterior"
)