| sk_fit {snapKrig} | R Documentation |
Fit a covariance model to an sk grid by maximum likelihood
Description
This uses stats::optim to minimize the log-likelihood function for a grid dataset
g over the space of unknown parameters for the covariance function specified in pars.
If only one parameter is unknown, the function instead uses stats::optimize.
Usage
sk_fit(
g,
pars = NULL,
X = NA,
iso = TRUE,
n_max = 1000,
quiet = FALSE,
lower = NULL,
initial = NULL,
upper = NULL,
log_scale = TRUE,
method = "L-BFGS-B",
control = list()
)
Arguments
g |
an sk grid (or list with entries 'gdim', 'gres', 'gval') |
pars |
covariance parameter list, with |
X |
numeric (or NA), matrix, or sk grid of linear predictors, passed to |
iso |
logical, indicating to constrain the y and x kernel parameters to be the same |
n_max |
integer, the maximum number of observations allowed |
quiet |
logical, indicating to suppress console output |
lower |
numeric vector, lower bounds for parameters |
initial |
numeric vector, initial values for parameters |
upper |
numeric vector, upper bounds for parameters |
log_scale |
logical, indicating to log-transform parameters for optimization |
method |
character, passed to |
control |
list, passed to |
Details
NA entries in pars are treated as unknown parameters and fitted by the
function, whereas non-NA entries are treated as fixed parameters (and not fitted).
If none of the parameters in pars are NA, the function copies everything as initial
values, then treats all parameters as unknown. pars can also be a character vector
defining a pair of correlograms (see ?sk_pars) in which case all covariance parameters
are treated as unknown.
Bounds and initial values are set automatically using sk_bds, unless they are otherwise
specified in arguments lower, initial, upper. These should be vectors containing only
the unknown parameters, ie. they must exclude fixed parameters. Call
sk_update_pars(pars, iso=iso) to get a template with the expected names and order.
All parameters in the covariance models supported by snapKrig are strictly positive.
Optimization is (by default) done on the parameter log-scale, and users can select a
non-constrained method if they wish (?stats::optim). As the default method 'L-BFGS-B'
is the only one that accepts bounds (lower, initial, upper are otherwise ignored)
method is ignored when log_scale=FALSE.
Note that the 'gxp' and 'mat' correlograms behave strangely with very small or very large shape parameters, so for them we recommended 'L-BFGS-B' only.
When there is only one unknown parameter, the function uses stats::optimize instead of
stats::optim. In this case all entries of control with the exception of 'tol' are
ignored, as are bounds and initial values, and arguments to method.
As a sanity check n_max sets a maximum for the number of observed grid points. This
is to avoid accidental calls with very large datasets that would cause R to hang or crash.
Set n_max=Inf (with caution) to bypass this check. Similarly the maximum number of
iterations is set to 1e3 but this can be changed by manually setting 'maxit' in
control.
Value
A parameter list in the form returned by sk_pars containing both fixed and
fitted parameters. The data-frame of bounds and initial values is also included in the
attribute 'bds'
See Also
sk sk_LL sk_nLL stats::optim stats::optimize
Other parameter managers:
sk_bds(),
sk_kp(),
sk_pars_make(),
sk_pars_update(),
sk_pars(),
sk_to_string()
Examples
# define a grid
gdim = c(50, 51)
g_empty = sk(gdim)
pars_src = sk_pars(g_empty)
pars_src = utils::modifyList(pars_src, list(eps=runif(1, 0, 1e1), psill=runif(1, 0, 1e2)))
pars_src[['y']][['kp']] = pars_src[['x']][['kp']] = runif(1, 1, 50)
# generate example data
g_obs = sk_sim(g_empty, pars_src)
sk_plot(g_obs)
# fit (set maxit low to keep check time short)
fit_result = sk_fit(g_obs, pars='gau', control=list(maxit=25), quiet=TRUE)
# compare estimates with true values
rbind(true=sk_pars_update(pars_src), fitted=sk_pars_update(fit_result))
# extract bounds data frame
attr(fit_result, 'bds')
# non-essential examples skipped to stay below 5s exec time on slower machines
# check a sequence of other psill values
pars_out = fit_result
psill_test = ( 2^(seq(5) - 3) ) * pars_out[['psill']]
LL_test = sapply(psill_test, function(s) sk_LL(utils::modifyList(pars_out, list(psill=s)), g_obs) )
plot(psill_test, LL_test)
lines(psill_test, LL_test)
print(data.frame(psill=psill_test, likelihood=LL_test))
# repeat with most data missing
n = prod(gdim)
n_obs = 25
g_obs = sk_sim(g_empty, pars_src)
idx_miss = sample.int(length(g_empty), length(g_empty) - n_obs)
g_miss = g_obs
g_miss[idx_miss] = NA
sk_plot(g_miss)
# fit (set maxit low to keep check time short) and compare
fit_result = sk_fit(g_miss, pars='gau', control=list(maxit=25), quiet=TRUE)
rbind(true=sk_pars_update(pars_src), fitted=sk_pars_update(fit_result))