R: Fit a covariance model to an sk grid by maximum likelihood

sk_fit {snapKrig}

R Documentation

Fit a covariance model to an sk grid by maximum likelihood

Description

This uses stats::optim to minimize the log-likelihood function for a grid dataset g over the space of unknown parameters for the covariance function specified in pars. If only one parameter is unknown, the function instead uses stats::optimize.

Usage

sk_fit(
  g,
  pars = NULL,
  X = NA,
  iso = TRUE,
  n_max = 1000,
  quiet = FALSE,
  lower = NULL,
  initial = NULL,
  upper = NULL,
  log_scale = TRUE,
  method = "L-BFGS-B",
  control = list()
)

Arguments

`g`	an sk grid (or list with entries 'gdim', 'gres', 'gval')
`pars`	covariance parameter list, with `NA`s indicating parameters to fit
`X`	numeric (or NA), matrix, or sk grid of linear predictors, passed to `sk_LL`
`iso`	logical, indicating to constrain the y and x kernel parameters to be the same
`n_max`	integer, the maximum number of observations allowed
`quiet`	logical, indicating to suppress console output
`lower`	numeric vector, lower bounds for parameters
`initial`	numeric vector, initial values for parameters
`upper`	numeric vector, upper bounds for parameters
`log_scale`	logical, indicating to log-transform parameters for optimization
`method`	character, passed to `stats::optim` (default is 'L-BFGS-B')
`control`	list, passed to `stats::optim`

Details

NA entries in pars are treated as unknown parameters and fitted by the function, whereas non-NA entries are treated as fixed parameters (and not fitted). If none of the parameters in pars are NA, the function copies everything as initial values, then treats all parameters as unknown. pars can also be a character vector defining a pair of correlograms (see ?sk_pars) in which case all covariance parameters are treated as unknown.

Bounds and initial values are set automatically using sk_bds, unless they are otherwise specified in arguments lower, initial, upper. These should be vectors containing only the unknown parameters, ie. they must exclude fixed parameters. Call sk_update_pars(pars, iso=iso) to get a template with the expected names and order.

All parameters in the covariance models supported by snapKrig are strictly positive. Optimization is (by default) done on the parameter log-scale, and users can select a non-constrained method if they wish (?stats::optim). As the default method 'L-BFGS-B' is the only one that accepts bounds (lower, initial, upper are otherwise ignored) method is ignored when log_scale=FALSE.

Note that the 'gxp' and 'mat' correlograms behave strangely with very small or very large shape parameters, so for them we recommended 'L-BFGS-B' only.

When there is only one unknown parameter, the function uses stats::optimize instead of stats::optim. In this case all entries of control with the exception of 'tol' are ignored, as are bounds and initial values, and arguments to method.

As a sanity check n_max sets a maximum for the number of observed grid points. This is to avoid accidental calls with very large datasets that would cause R to hang or crash. Set n_max=Inf (with caution) to bypass this check. Similarly the maximum number of iterations is set to 1e3 but this can be changed by manually setting 'maxit' in control.

Value

A parameter list in the form returned by sk_pars containing both fixed and fitted parameters. The data-frame of bounds and initial values is also included in the attribute 'bds'

Examples


# define a grid
gdim = c(50, 51)
g_empty = sk(gdim)
pars_src = sk_pars(g_empty)
pars_src = utils::modifyList(pars_src, list(eps=runif(1, 0, 1e1), psill=runif(1, 0, 1e2)))
pars_src[['y']][['kp']] = pars_src[['x']][['kp']] = runif(1, 1, 50)

# generate example data
g_obs = sk_sim(g_empty, pars_src)
sk_plot(g_obs)

# fit (set maxit low to keep check time short)
fit_result = sk_fit(g_obs, pars='gau', control=list(maxit=25), quiet=TRUE)

# compare estimates with true values
rbind(true=sk_pars_update(pars_src), fitted=sk_pars_update(fit_result))

# extract bounds data frame
attr(fit_result, 'bds')

# non-essential examples skipped to stay below 5s exec time on slower machines

# check a sequence of other psill values
pars_out = fit_result
psill_test = ( 2^(seq(5) - 3) ) * pars_out[['psill']]
LL_test = sapply(psill_test, function(s) sk_LL(utils::modifyList(pars_out, list(psill=s)), g_obs) )
plot(psill_test, LL_test)
lines(psill_test, LL_test)
print(data.frame(psill=psill_test, likelihood=LL_test))

# repeat with most data missing
n = prod(gdim)
n_obs = 25
g_obs = sk_sim(g_empty, pars_src)
idx_miss = sample.int(length(g_empty), length(g_empty) - n_obs)
g_miss = g_obs
g_miss[idx_miss] = NA
sk_plot(g_miss)

# fit (set maxit low to keep check time short) and compare
fit_result = sk_fit(g_miss, pars='gau', control=list(maxit=25), quiet=TRUE)
rbind(true=sk_pars_update(pars_src), fitted=sk_pars_update(fit_result))

[Package snapKrig version 0.0.2 Index]