R: Profile Loglikelihoods and Confidence Intervals for...

profile_loglikelihood {soilhypfit}

R Documentation

Profile Loglikelihoods and Confidence Intervals for Parametric Modelling of Soil Hydraulic Properties

Description

The function prfloglik_sample computes for a single soil sample a loglikelihood profile as a function of the specified values for subsets of model parameters of soil water retention and/or soil hydraulic conductivity functions. The function confint_prfloglik_sample computes a confidence interval of one model parameter based on the likelihood ratio test for a single soil sample, and the S3 method confint computes confidence intervals of nonlinear model parameters for multiple soil samples.

Usage

prfloglik_sample(object, values, soil_sample,
    ncores = min(detectCores() - 1L, NROW(values)), verbose = 0)

confint_prfloglik_sample(object, parm = names(default_fit_param()),
    soil_sample, level = 0.95, test = c("F", "Chisq"),
    denominator_df = c("nonlinear", "all"), param_bound = NULL,
    root_tol = .Machine$double.eps^0.25, froot_tol = sqrt(root_tol),
    verbose = 0)

## S3 method for class 'fit_wrc_hcc'
confint(object,
    parm = names(object[["control"]][["initial_param"]]), level = 0.95,
    subset = NULL, type = c("loglik", "normal"), test = c("F", "Chisq"),
    denominator_df = c("nonlinear", "all"),
    root_tol = .Machine$double.eps^0.25, froot_tol = sqrt(root_tol),
    ncores = detectCores() - 1L, verbose = 0, ...)

Arguments

`object`	an object of class `fit_wrc_hcc`, see `fit_wrc_hcc`.
`values`	a `data.frame` or a `matrix` with the values of the conditionally linear (`\boldsymbol{\mu}^\mathrm{}` ) and nonlinear parameters (`\boldsymbol{\nu}^\mathrm{}` ) that should be kept fixed to compute the likelihood profile (mandatory argument, see `soilhypfitIntro`, `wc_model` and `hc_model` for information about the parametrization of models for soil water retention curves and/or soil hydraulic conductivity functions.). The names of the columns of `values` must match the names of model parameters.
`soil_sample`	a character scalar with the label of the soil sample for which the loglikelihood profile or the confidence interval should be computed. If `object` contains parameter estimates for a single soil sample then `soil_sample` is ignored, otherwise `soil_sample` is a mandatory argument.
`ncores`	an integer defining the number of cores for parallel computations. `ncores = 1` suppresses parallel computations.
`verbose`	positive integer controlling logging of diagnostic messages to the console and plotting of data and model curves during fitting, see `fit_wrc_hcc`.
`parm`	character scalar (`confint_prfloglik_sample`) or vector (`confint`) with name(s) of parameter(s) for which to compute the confidence interval(s). Note that `confint_prfloglik_sample` allows to compute a confidence interval for all parameters (including the linear ones), whereas the `confint` method computes confidence intervals for the nonlinear parameters (`\boldsymbol{\nu}^\mathrm{}` ) only, see `soilhypfitIntro` for information about the parametrization of models for soil water retention curves and/or soil hydraulic conductivity functions.
`level`	numeric scalar with the confidence level required to compute the confidence interval.
`test`	character keyword specifying whether to use the asymptotic `\chi^2`-distribution or the finite sample approximate F-distribution for the likelihood ratio test statistic when computing the confidence interval, see Details.
`denominator_df`	character keyword specifying whether the denominator degrees of freedom for the F-distribution of the test statistic is equal to the number of estimated nonlinear parameters (`"nonlinear"`, default) or equal to the total number of estimated parameters (`"all"`).
`param_bound`	a numeric vector of length 2 with the allowed range of the parameter for which the confidence interval is computed. The limits of the confidence interval are searched within this range, see Details. When equal to `NULL` (default) `param_bound` is taken from to component `initial_objects` of the selected component `fit` of `object`, see Details.
`root_tol`	a numeric scalar defining the desired accuracy (convergence tolerance) for root finding by `uniroot` when computing the confidence interval.
`froot_tol`	a numeric scalar defining the desired accuracy (function value tolerance) for deciding whether a root has been found.
`subset`	an integer, character or logical vector to the choose the soil samples for which confidence intervals should be computed. Defaults to `NULL` which computes the intervals for all samples contained in `object`.
`type`	character keyword specifying whether to compute confidence intervals based on the likelihood ratio test (`"logik"`, default) by `confint_prfloglik_sample` or based on the asymptotic normal distribution of maximum likelihood estimates (`"normal"`), see Details.
`...`	additional arguments passed to methods, currently not used.

Details

Computing likelihood profiles

The function prfloglik_sample computes the loglikelihood profile for a subset of the nonlinear (\boldsymbol{\nu}^\mathrm{} ) (and linear \boldsymbol{\mu}^\mathrm{} ) model parameters. We denote the profiling parameters of interest by \boldsymbol{\phi} and the nuisance parameters that are estimated when computing the profile by \boldsymbol{\psi}, so that ( \boldsymbol{\phi}^\mathrm{T}, \boldsymbol{\psi}^\mathrm{T} )^\mathrm{T} is a rearranged version of the parameter vector ( \boldsymbol{\mu}^\mathrm{T}, \boldsymbol{\nu}^\mathrm{T} )^\mathrm{T}, see soilhypfitIntro. prfloglik_sample computes the estimates \widehat{\boldsymbol{\psi}}(\boldsymbol{\phi}) of \boldsymbol{\psi} and the profile loglikelihood Q(\boldsymbol{\phi}, \widehat{\boldsymbol{\psi}}(\boldsymbol{\phi}); \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h} ) as a function of \boldsymbol{\phi}.

To compute the estimates, \boldsymbol{\psi} is partitioned into the nonlinear and conditionally linear parameters, see soilhypfitIntro for details. prfloglik_sample uses the packages parallel and snowfall for parallelized computation of loglikelihood profiles.

Computing likelihood based confidence intervals

The function confint_prfloglik_sample computes the confidence interval of a single model parameter \phi \in ( \boldsymbol{\mu}^\mathrm{T}, \boldsymbol{\nu}^\mathrm{T} )^\mathrm{T} based on the likelihood ratio test. The interval is computed by assuming either

that the test statistic T(\phi) = \Delta Q(\phi) = 2( Q(\widehat{\phi}, \widehat{\boldsymbol{\psi}}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h} ) - Q(\phi, \widehat{\boldsymbol{\psi}}(\phi); \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h} )) follows a \chi^2_1-distribution with 1 degrees of freedom (possible choice when both water retention and/or hydraulic conductivity data were used to estimate the parameters), or
that the transformed test statistic T(\phi) = \left(\exp(\Delta Q(\phi) / n) - 1 \right) (n - p) follows an F(1,n-p)-distribution where n is the number of water content or hydraulic conductivity measurements, and p is the number of estimated parameters (see Uusipaikka, 2008, p. 115, for details). denominator_df allows to control how p is chosen. Note that this test distribution can only be chosen (and is then the default) when only water retention or only hydraulic conductivty data were used to estimate the parameters.

confint_prfloglik_sample computes profile loglikelihoods Q(\phi, \widehat{\boldsymbol{\psi}}(\phi); \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h} ) by
prfloglik_sample and then uses uniroot to search the roots of the equation

f(\phi) = q_T(\gamma)- T(\phi)

in the interval defined by param_bound. q_T(\gamma) is the \gamma-quantile of the chosen test distribution for T.

Computing confidence intervals for several soil samples

The confint method computes 2 types of confidence intervals (in dependence of type) of only the nonlinear parameters \boldsymbol{\nu}^\mathrm{} , possibly for multiple soil samples at the same time:

intervals based on the asymptotic normal distribution of maximum likelihood estimates with standard errors computed by the vcov method for class fit_wrc_hcc (see vcov.fit_wrc_hcc,
intervals based on the likelihood ratio test by confint_prfloglik_sample. The intervals for several soil samples are computed in parallel.

Requirements for computing loglikelihood profiles

The parameters contained in object must be estimated by maximum likelihood (method = "ml", see soilhypfitIntro and control_fit_wrc_hcc. Use of other estimation methods results an error.

Preferably an unconstrained local algorithm (settings = "ulocal", see soilhypfitIntro and control_fit_wrc_hcc)) is used for minimizing the negative loglikelihood when generating object. Use of other algorithms generates a warning message.

Value

The function prfloglik_sample returns a data.frame with the columns of values (parameters of interest \boldsymbol{\phi}), a column loglik with the maximized profile loglikelihood Q(\phi, \widehat{\boldsymbol{\psi}}(\phi); \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h} ), columns with the estimated parameters \widehat{\boldsymbol{\psi}}(\boldsymbol{\phi}), columns with the gradient of the loglikelihood with respect to the estimated nonlinear parameters (missing if all nonlinear parameters were fixed) and a column converged, indicating whether convergence has occurred (see convergence_message) when estimating the nonlinear parameters (equal to NA when all nonlinear parameters are fixed).

The function confint_prfloglik_sample returns a numeric vector of length 2 with the lower and upper limits of the confidence interval. If no roots were found then NA is returned. The returned result further contains as attribute list prfloglik the parameter estimate \widehat{\phi} (param_estimate), the maximized loglikelihood Q(\widehat{\phi}, \widehat{\boldsymbol{\psi}}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h} ) (loglik), the quantile of the test distribution q_{\mathrm{test}}(\gamma) (qtest), the type of test distribution used (test), the significance level, the number of water content (nobs_wrc) and conductivity measurements (nobs_hcc) and the function values of f(\phi) evaluated at the roots returned by uniroot.

The method confint returns a dataframe with the lower and upper limits of the confidence intervals for the estimated nonlinear parameters.

Author(s)

Andreas Papritz papritz@retired.ethz.ch.

References

Uusipaikka, E. (2008) Confidence Intervals in Generalized Regression Models. Chapman & Hall/CRC Press, Boca Raton doi:10.1201/9781420060386.

Examples


# use of \donttest{} because execution time exceeds 5 seconds

library(lattice)

data(sim_wrc_hcc)

# define number of cores for parallel computations
if(interactive()) ncpu <- parallel::detectCores() - 1L else ncpu <- 1L

# estimate parameters for 3 samples simultaneously by ...

# ... unconstrained, global optimisation algorithm NLOPT_GN_MLSL (default)
rfit_uglob <- fit_wrc_hcc(
  wrc_formula = wc ~ head | id, hcc_formula = hc ~ head | id,
  data = sim_wrc_hcc,
  control = control_fit_wrc_hcc(param_bound = param_boundf(
      alpha = c(0.00001, 50), n = c(1.0001, 7), tau = c(-1, 5)
    ), pcmp = control_pcmp(ncores = ncpu)))

# ... unconstrained, local optimisation algorithm NLOPT_LN_BOBYQA,
rfit_uloc <- update(
  rfit_uglob,
  param = as.matrix(coef(rfit_uglob, what = "nonlinear")),
  control = control_fit_wrc_hcc(
    settings = "ulocal", param_tf = param_transf(
      alpha = "identity", n = "identity", tau = "identity"
    ), param_bound = param_boundf(
      alpha = c(0.00001, 50), n = c(1.0001, 7), tau = c(-1, 5)
    ), pcmp = control_pcmp(ncores = ncpu)))

# extract estimated parameters for sample id == "1"
coef_id_1 <- unlist(coef(rfit_uloc, gof = TRUE, se = TRUE, subset = "1"))

# compute loglikelihood profile of parameter alpha for sample id == "1"
rprfloglik_alpha <- prfloglik_sample(
  rfit_uloc, values = data.frame(alpha = seq(1.5, 3.0, length = 40L)),
  soil_sample = "1", ncores = ncpu)
# plot loglikelihood profile along with 95% confidence intervals
plot(loglik ~ alpha, rprfloglik_alpha, type = "l")
abline(v = coef_id_1["alpha"])
# 95% confidence intervall based on likelihood ratio test
abline(h = -coef_id_1["obj"] - 0.5 * qchisq(0.95, 1), lty = "dashed")
# 95% confidence intervall based on asymptotic normal distribution
segments(
  x0 = coef_id_1["alpha"] + qnorm(0.025) * coef_id_1["se.alpha"],
  x1 = coef_id_1["alpha"] + qnorm(0.975) * coef_id_1["se.alpha"],
  y0 = min(rprfloglik_alpha$loglik)
)

# compute loglikelihood profile of parameter n for sample id == "1"
rprfloglik_n <- prfloglik_sample(
  rfit_uloc, values = data.frame(n = seq(1.7, 2.25, length = 40L)),
  soil_sample = "1", ncores = ncpu
)
# plot loglikelihood profile along with 95% confidence intervals
plot(loglik ~ n, rprfloglik_n, type = "l")
abline(v = coef_id_1["n"])
# 95% confidence intervall based on likelihood ratio test
abline(h = -coef_id_1["obj"] - 0.5 * qchisq(0.95, 1), lty = "dashed")
# 95% confidence intervall based on asymptotic normal distribution
segments(
  x0 = coef_id_1["n"] + qnorm(0.025) * coef_id_1["se.n"],
  x1 = coef_id_1["n"] + qnorm(0.975) * coef_id_1["se.n"],
  y0 = min(rprfloglik_n$loglik)
)

# compute loglikelihood profile of parameters alpha and n for sample id == "1"
rprfloglik_alpha_n <- prfloglik_sample(
  rfit_uloc, values = expand.grid(
    alpha = seq(1.5, 3.0, length = 40L), n = seq(1.7, 2.25, length = 40L)),
  soil_sample = "1", ncores = ncpu
)

# joint confidence region of alpha and n based on likelihood ratio test
levelplot(loglik ~ alpha + n, rprfloglik_alpha_n,
  panel = function(x, y, z, subscripts, ...){
    panel.levelplot(x, y, z, subscripts, ...)
    panel.levelplot(x, y, z, subscripts, region = FALSE, contour = TRUE,
      at = -coef_id_1["obj"] - 0.5 * qchisq(0.95, 2),
      lty = "solid"
    )
    panel.levelplot(x, y, z, subscripts, region = FALSE, contour = TRUE,
      at = -coef_id_1["obj"] - 0.5 * qchisq(0.9, 2),
      lty = "dashed"
    )
    panel.levelplot(x, y, z, subscripts, region = FALSE, contour = TRUE,
      at = -coef_id_1["obj"] - 0.5 * qchisq(0.8, 2),
      lty = "dotted"
    )
    panel.points(coef_id_1["alpha"], coef_id_1["n"], pch = 16)
    panel.lines(
      x = rprfloglik_alpha[, c("alpha", "n")], col = "blue"
    )
    panel.lines(
      x = rprfloglik_n[, c("alpha", "n")], col = "magenta"
    )
  }, key = list(
    corner = c(1, 0.97),
    lines = list(
      lty = c(rep("solid", 3), "dashed", "dotted"),
      col = c("blue", "magenta", rep("black", 3))
    ),
    text = list(c(
        "estimate of n as a function of fixed alpha",
        "estimate of alpha as a function of fixed n",
        "95% joint confidence region",
        "90% joint confidence region",
        "80% joint confidence region"
      ))
  ))

# compute 95%-confidence interval
(ci.alpha <- confint_prfloglik_sample(
  rfit_uloc, parm = "alpha", soil_sample = "1"
))
# use limits to draw loglikelihood profile for alpha
rprfloglik_alpha <- prfloglik_sample(
  rfit_uloc, values = data.frame(
    alpha = seq(0.9 * ci.alpha[1], 1.1 * ci.alpha[2], length = 40L)),
  soil_sample = "1", ncores = ncpu)
plot(loglik ~ alpha, rprfloglik_alpha, type = "l")
lines(
  ci.alpha,
  rep(diff(unlist(attr(ci.alpha, "prfloglik")[c("qtest", "loglik")])) , 2)
)
abline(v = attr(ci.alpha, "prfloglik")["estimate"], lty = "dashed")

# 95%-confidence intervals of all nonlinear parameters based for all
# soil samples asymptotic normal distribution of maximum likelihood estimates
confint(rfit_uloc, type = "normal")

# 95%-confidence intervals of all nonlinear parameters based for all
# soil samples based on likelihood ratio test
confint(rfit_uloc, ncores = ncpu)

[Package soilhypfit version 0.1-7 Index]