profile_loglikelihood {soilhypfit}R Documentation

Profile Loglikelihoods and Confidence Intervals for Parametric Modelling of Soil Hydraulic Properties

Description

The function prfloglik_sample computes for a single soil sample a loglikelihood profile as a function of the specified values for subsets of model parameters of soil water retention and/or soil hydraulic conductivity functions. The function confint_prfloglik_sample computes a confidence interval of one model parameter based on the likelihood ratio test for a single soil sample, and the S3 method confint computes confidence intervals of nonlinear model parameters for multiple soil samples.

Usage

prfloglik_sample(object, values, soil_sample,
    ncores = min(detectCores() - 1L, NROW(values)), verbose = 0)

confint_prfloglik_sample(object, parm = names(default_fit_param()),
    soil_sample, level = 0.95, test = c("F", "Chisq"),
    denominator_df = c("nonlinear", "all"), param_bound = NULL,
    root_tol = .Machine$double.eps^0.25, froot_tol = sqrt(root_tol),
    verbose = 0)

## S3 method for class 'fit_wrc_hcc'
confint(object,
    parm = names(object[["control"]][["initial_param"]]), level = 0.95,
    subset = NULL, type = c("loglik", "normal"), test = c("F", "Chisq"),
    denominator_df = c("nonlinear", "all"),
    root_tol = .Machine$double.eps^0.25, froot_tol = sqrt(root_tol),
    ncores = detectCores() - 1L, verbose = 0, ...)

Arguments

object

an object of class fit_wrc_hcc, see fit_wrc_hcc.

values

a data.frame or a matrix with the values of the conditionally linear (\boldsymbol{\mu}^\mathrm{} ) and nonlinear parameters (\boldsymbol{\nu}^\mathrm{} ) that should be kept fixed to compute the likelihood profile (mandatory argument, see soilhypfitIntro, wc_model and hc_model for information about the parametrization of models for soil water retention curves and/or soil hydraulic conductivity functions.). The names of the columns of values must match the names of model parameters.

soil_sample

a character scalar with the label of the soil sample for which the loglikelihood profile or the confidence interval should be computed. If object contains parameter estimates for a single soil sample then soil_sample is ignored, otherwise soil_sample is a mandatory argument.

ncores

an integer defining the number of cores for parallel computations. ncores = 1 suppresses parallel computations.

verbose

positive integer controlling logging of diagnostic messages to the console and plotting of data and model curves during fitting, see fit_wrc_hcc.

parm

character scalar (confint_prfloglik_sample) or vector (confint) with name(s) of parameter(s) for which to compute the confidence interval(s). Note that confint_prfloglik_sample allows to compute a confidence interval for all parameters (including the linear ones), whereas the confint method computes confidence intervals for the nonlinear parameters (\boldsymbol{\nu}^\mathrm{} ) only, see soilhypfitIntro for information about the parametrization of models for soil water retention curves and/or soil hydraulic conductivity functions.

level

numeric scalar with the confidence level required to compute the confidence interval.

test

character keyword specifying whether to use the asymptotic \chi^2-distribution or the finite sample approximate F-distribution for the likelihood ratio test statistic when computing the confidence interval, see Details.

denominator_df

character keyword specifying whether the denominator degrees of freedom for the F-distribution of the test statistic is equal to the number of estimated nonlinear parameters ("nonlinear", default) or equal to the total number of estimated parameters ("all").

param_bound

a numeric vector of length 2 with the allowed range of the parameter for which the confidence interval is computed. The limits of the confidence interval are searched within this range, see Details. When equal to NULL (default) param_bound is taken from to component initial_objects of the selected component fit of object, see Details.

root_tol

a numeric scalar defining the desired accuracy (convergence tolerance) for root finding by uniroot when computing the confidence interval.

froot_tol

a numeric scalar defining the desired accuracy (function value tolerance) for deciding whether a root has been found.

subset

an integer, character or logical vector to the choose the soil samples for which confidence intervals should be computed. Defaults to NULL which computes the intervals for all samples contained in object.

type

character keyword specifying whether to compute confidence intervals based on the likelihood ratio test ("logik", default) by confint_prfloglik_sample or based on the asymptotic normal distribution of maximum likelihood estimates ("normal"), see Details.

...

additional arguments passed to methods, currently not used.

Details

Computing likelihood profiles

The function prfloglik_sample computes the loglikelihood profile for a subset of the nonlinear (\boldsymbol{\nu}^\mathrm{} ) (and linear \boldsymbol{\mu}^\mathrm{} ) model parameters. We denote the profiling parameters of interest by \boldsymbol{\phi} and the nuisance parameters that are estimated when computing the profile by \boldsymbol{\psi}, so that ( \boldsymbol{\phi}^\mathrm{T}, \boldsymbol{\psi}^\mathrm{T} )^\mathrm{T} is a rearranged version of the parameter vector ( \boldsymbol{\mu}^\mathrm{T}, \boldsymbol{\nu}^\mathrm{T} )^\mathrm{T} , see soilhypfitIntro. prfloglik_sample computes the estimates \widehat{\boldsymbol{\psi}}(\boldsymbol{\phi}) of \boldsymbol{\psi} and the profile loglikelihood Q(\boldsymbol{\phi}, \widehat{\boldsymbol{\psi}}(\boldsymbol{\phi}); \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h} ) as a function of \boldsymbol{\phi}.

To compute the estimates, \boldsymbol{\psi} is partitioned into the nonlinear and conditionally linear parameters, see soilhypfitIntro for details. prfloglik_sample uses the packages parallel and snowfall for parallelized computation of loglikelihood profiles.

Computing likelihood based confidence intervals

The function confint_prfloglik_sample computes the confidence interval of a single model parameter \phi \in ( \boldsymbol{\mu}^\mathrm{T}, \boldsymbol{\nu}^\mathrm{T} )^\mathrm{T} based on the likelihood ratio test. The interval is computed by assuming either

confint_prfloglik_sample computes profile loglikelihoods Q(\phi, \widehat{\boldsymbol{\psi}}(\phi); \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h} ) by
prfloglik_sample and then uses uniroot to search the roots of the equation

f(\phi) = q_T(\gamma)- T(\phi)

in the interval defined by param_bound. q_T(\gamma) is the \gamma-quantile of the chosen test distribution for T.

Computing confidence intervals for several soil samples

The confint method computes 2 types of confidence intervals (in dependence of type) of only the nonlinear parameters \boldsymbol{\nu}^\mathrm{} , possibly for multiple soil samples at the same time:

Requirements for computing loglikelihood profiles

The parameters contained in object must be estimated by maximum likelihood (method = "ml", see soilhypfitIntro and control_fit_wrc_hcc. Use of other estimation methods results an error.

Preferably an unconstrained local algorithm (settings = "ulocal", see soilhypfitIntro and control_fit_wrc_hcc)) is used for minimizing the negative loglikelihood when generating object. Use of other algorithms generates a warning message.

Value

The function prfloglik_sample returns a data.frame with the columns of values (parameters of interest \boldsymbol{\phi}), a column loglik with the maximized profile loglikelihood Q(\phi, \widehat{\boldsymbol{\psi}}(\phi); \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h} ) , columns with the estimated parameters \widehat{\boldsymbol{\psi}}(\boldsymbol{\phi}) , columns with the gradient of the loglikelihood with respect to the estimated nonlinear parameters (missing if all nonlinear parameters were fixed) and a column converged, indicating whether convergence has occurred (see convergence_message) when estimating the nonlinear parameters (equal to NA when all nonlinear parameters are fixed).

The function confint_prfloglik_sample returns a numeric vector of length 2 with the lower and upper limits of the confidence interval. If no roots were found then NA is returned. The returned result further contains as attribute list prfloglik the parameter estimate \widehat{\phi} (param_estimate), the maximized loglikelihood Q(\widehat{\phi}, \widehat{\boldsymbol{\psi}}; \boldsymbol{\theta}, \boldsymbol{K}, \boldsymbol{h} ) (loglik), the quantile of the test distribution q_{\mathrm{test}}(\gamma) (qtest), the type of test distribution used (test), the significance level, the number of water content (nobs_wrc) and conductivity measurements (nobs_hcc) and the function values of f(\phi) evaluated at the roots returned by uniroot.

The method confint returns a dataframe with the lower and upper limits of the confidence intervals for the estimated nonlinear parameters.

Author(s)

Andreas Papritz papritz@retired.ethz.ch.

References

Uusipaikka, E. (2008) Confidence Intervals in Generalized Regression Models. Chapman & Hall/CRC Press, Boca Raton doi:10.1201/9781420060386.

See Also

soilhypfitIntro for a description of the models and a brief summary of the parameter estimation approach;

fit_wrc_hcc for (constrained) estimation of parameters of models for soil water retention and hydraulic conductivity data;

control_fit_wrc_hcc for options to control fit_wrc_hcc;

soilhypfitmethods for common S3 methods for class fit_wrc_hcc;

vcov for computing (co-)variances of the estimated nonlinear parameters;

wc_model and hc_model for currently implemented models for soil water retention curves and hydraulic conductivity functions;

evaporative-length for physically constraining parameter estimates of soil hydraulic material functions.

Examples


# use of \donttest{} because execution time exceeds 5 seconds

library(lattice)

data(sim_wrc_hcc)

# define number of cores for parallel computations
if(interactive()) ncpu <- parallel::detectCores() - 1L else ncpu <- 1L

# estimate parameters for 3 samples simultaneously by ...

# ... unconstrained, global optimisation algorithm NLOPT_GN_MLSL (default)
rfit_uglob <- fit_wrc_hcc(
  wrc_formula = wc ~ head | id, hcc_formula = hc ~ head | id,
  data = sim_wrc_hcc,
  control = control_fit_wrc_hcc(param_bound = param_boundf(
      alpha = c(0.00001, 50), n = c(1.0001, 7), tau = c(-1, 5)
    ), pcmp = control_pcmp(ncores = ncpu)))

# ... unconstrained, local optimisation algorithm NLOPT_LN_BOBYQA,
rfit_uloc <- update(
  rfit_uglob,
  param = as.matrix(coef(rfit_uglob, what = "nonlinear")),
  control = control_fit_wrc_hcc(
    settings = "ulocal", param_tf = param_transf(
      alpha = "identity", n = "identity", tau = "identity"
    ), param_bound = param_boundf(
      alpha = c(0.00001, 50), n = c(1.0001, 7), tau = c(-1, 5)
    ), pcmp = control_pcmp(ncores = ncpu)))

# extract estimated parameters for sample id == "1"
coef_id_1 <- unlist(coef(rfit_uloc, gof = TRUE, se = TRUE, subset = "1"))

# compute loglikelihood profile of parameter alpha for sample id == "1"
rprfloglik_alpha <- prfloglik_sample(
  rfit_uloc, values = data.frame(alpha = seq(1.5, 3.0, length = 40L)),
  soil_sample = "1", ncores = ncpu)
# plot loglikelihood profile along with 95% confidence intervals
plot(loglik ~ alpha, rprfloglik_alpha, type = "l")
abline(v = coef_id_1["alpha"])
# 95% confidence intervall based on likelihood ratio test
abline(h = -coef_id_1["obj"] - 0.5 * qchisq(0.95, 1), lty = "dashed")
# 95% confidence intervall based on asymptotic normal distribution
segments(
  x0 = coef_id_1["alpha"] + qnorm(0.025) * coef_id_1["se.alpha"],
  x1 = coef_id_1["alpha"] + qnorm(0.975) * coef_id_1["se.alpha"],
  y0 = min(rprfloglik_alpha$loglik)
)

# compute loglikelihood profile of parameter n for sample id == "1"
rprfloglik_n <- prfloglik_sample(
  rfit_uloc, values = data.frame(n = seq(1.7, 2.25, length = 40L)),
  soil_sample = "1", ncores = ncpu
)
# plot loglikelihood profile along with 95% confidence intervals
plot(loglik ~ n, rprfloglik_n, type = "l")
abline(v = coef_id_1["n"])
# 95% confidence intervall based on likelihood ratio test
abline(h = -coef_id_1["obj"] - 0.5 * qchisq(0.95, 1), lty = "dashed")
# 95% confidence intervall based on asymptotic normal distribution
segments(
  x0 = coef_id_1["n"] + qnorm(0.025) * coef_id_1["se.n"],
  x1 = coef_id_1["n"] + qnorm(0.975) * coef_id_1["se.n"],
  y0 = min(rprfloglik_n$loglik)
)

# compute loglikelihood profile of parameters alpha and n for sample id == "1"
rprfloglik_alpha_n <- prfloglik_sample(
  rfit_uloc, values = expand.grid(
    alpha = seq(1.5, 3.0, length = 40L), n = seq(1.7, 2.25, length = 40L)),
  soil_sample = "1", ncores = ncpu
)

# joint confidence region of alpha and n based on likelihood ratio test
levelplot(loglik ~ alpha + n, rprfloglik_alpha_n,
  panel = function(x, y, z, subscripts, ...){
    panel.levelplot(x, y, z, subscripts, ...)
    panel.levelplot(x, y, z, subscripts, region = FALSE, contour = TRUE,
      at = -coef_id_1["obj"] - 0.5 * qchisq(0.95, 2),
      lty = "solid"
    )
    panel.levelplot(x, y, z, subscripts, region = FALSE, contour = TRUE,
      at = -coef_id_1["obj"] - 0.5 * qchisq(0.9, 2),
      lty = "dashed"
    )
    panel.levelplot(x, y, z, subscripts, region = FALSE, contour = TRUE,
      at = -coef_id_1["obj"] - 0.5 * qchisq(0.8, 2),
      lty = "dotted"
    )
    panel.points(coef_id_1["alpha"], coef_id_1["n"], pch = 16)
    panel.lines(
      x = rprfloglik_alpha[, c("alpha", "n")], col = "blue"
    )
    panel.lines(
      x = rprfloglik_n[, c("alpha", "n")], col = "magenta"
    )
  }, key = list(
    corner = c(1, 0.97),
    lines = list(
      lty = c(rep("solid", 3), "dashed", "dotted"),
      col = c("blue", "magenta", rep("black", 3))
    ),
    text = list(c(
        "estimate of n as a function of fixed alpha",
        "estimate of alpha as a function of fixed n",
        "95% joint confidence region",
        "90% joint confidence region",
        "80% joint confidence region"
      ))
  ))

# compute 95%-confidence interval
(ci.alpha <- confint_prfloglik_sample(
  rfit_uloc, parm = "alpha", soil_sample = "1"
))
# use limits to draw loglikelihood profile for alpha
rprfloglik_alpha <- prfloglik_sample(
  rfit_uloc, values = data.frame(
    alpha = seq(0.9 * ci.alpha[1], 1.1 * ci.alpha[2], length = 40L)),
  soil_sample = "1", ncores = ncpu)
plot(loglik ~ alpha, rprfloglik_alpha, type = "l")
lines(
  ci.alpha,
  rep(diff(unlist(attr(ci.alpha, "prfloglik")[c("qtest", "loglik")])) , 2)
)
abline(v = attr(ci.alpha, "prfloglik")["estimate"], lty = "dashed")

# 95%-confidence intervals of all nonlinear parameters based for all
# soil samples asymptotic normal distribution of maximum likelihood estimates
confint(rfit_uloc, type = "normal")

# 95%-confidence intervals of all nonlinear parameters based for all
# soil samples based on likelihood ratio test
confint(rfit_uloc, ncores = ncpu)


[Package soilhypfit version 0.1-7 Index]