R: Calibration for CQO and CAO models

calibrate.qrrvglm {VGAM}

R Documentation

Calibration for CQO and CAO models

Description

Performs maximum likelihood calibration for constrained quadratic and additive ordination models (CQO and CAO models are better known as QRR-VGLMs and RR-VGAMs respectively).

Usage

calibrate.qrrvglm(object, newdata = NULL,
    type = c("latvar", "predictors", "response", "vcov", "everything"),
    lr.confint = FALSE, cf.confint = FALSE,
    level = 0.95, initial.vals = NULL, ...)

Arguments

`object`	The fitted CQO/CAO model.
`newdata`	A data frame with new response data, such as new species data. The default is to use the original data used to fit the model; however, the calibration may take a long time to compute because the computations are expensive. Note that the creation of the model frame associated with `newdata` is fragile. Factors may not be created properly. If a variable is binary then its best for it to be straightforward and have only 0 and 1 as values.
`type`	What type of result to be returned. The first are the calibrated latent variables or site scores. This is always computed. The `"predictors"` are the linear/quadratic or additive predictors evaluated at the calibrated latent variables or site scores. The `"response"` are the fitted values (usually means) evaluated at the calibrated latent variables or site scores. The `"vcov"` are the Wald-type estimated variance-covariance matrices of the calibrated latent variables or site scores. The `"everything"` is for all of them, i.e., all `type`s. Note that for CAO models, the `"vcov"` type is unavailable.
`lr.confint`, `level`	Compute approximate likelihood ratio based confidence intervals? If `TRUE` then `level` is the confidence level required and one should have `type = "latvar"` or `type = "everything"`; and currently only rank-1 models are supported. This option works for CLO and CQO models and not for CAO models. The function `uniroot` is called to solve for the root of a nonlinear equation to obtain each confidence limit, and this is not entirely reliable. It is assumed that the likelihood function is unimodal about its MLE because only one root is returned if there is more than one. One root is found on each side of the MLE. Technically, the default is to find the value of the latent variable whose difference in deviance (or twice the difference in log-likelihoods) from the optimal model is equal to `qchisq(level, df = 1)`. The intervals are not true profile likelihood intervals because it is not possible to estimate the regression coefficients of the QRR-VGLM/RR-VGLM based on one response vector. See `confint` to get the flavour of these two arguments in general.
`cf.confint`	Compute approximate characteristic function based confidence intervals? If `TRUE` then `level` is the confidence level required and one should have `type = "latvar"` or `type = "everything"`; and currently only rank-1 models are supported. This option works for `binomialff` and `poissonff` CLO and CQO models and not for CAO models. The function `uniroot` is called to solve for the root of a nonlinear equation to obtain each confidence limit, and this is not entirely reliable. It is assumed that the likelihood function is unimodal because only one root is returned if there is more than one. Technically, the CDF of a normalized score statistic is obtained by Gauss–Hermite numerical integration of a complex-valued integrand, and this is based on the inversion formula described in Abate and Witt (1992).
`initial.vals`	Initial values for the search. For rank-1 models, this should be a vector having length equal to `nrow(newdata)`, and for rank-2 models this should be a two-column matrix with the number of rows equalling the number of rows in `newdata`. The default is a grid defined by arguments in `calibrate.qrrvglm.control`.
`...`	Arguments that are fed into `calibrate.qrrvglm.control`.

Details

Given a fitted regression CQO/CAO model, maximum likelihood calibration is theoretically easy and elegant. However, the method assumes that all the responses are independent, which is often not true in practice. More details and references are given in Yee (2018) and ch.6 of Yee (2015).

The function optim is used to search for the maximum likelihood solution. Good initial values are needed, and arguments in calibrate.qrrvglm.control allows the user some control over the choice of these.

Value

Several methods are implemented to obtain confidence intervals/regions for the calibration estimates. One method is when lr.confint = TRUE, then a 4-column matrix is returned with the confidence limits being the final 2 columns (if type = "everything" then the matrix is returned in the lr.confint list component). Another similar method is when cf.confint = TRUE. There may be some redundancy in whatever is returned. Other methods are returned by using type and they are described as follows.

The argument type determines what is returned. If type = "everything" then all the type values are returned in a list, with the following components. Each component has length nrow(newdata).

`latvar`	Calibrated latent variables or site scores (the default). This may have the attribute `"objectiveFunction"` which is usually the log-likelihood or the deviance.
`predictors`	linear/quadratic or additive predictors. For example, for Poisson families, this will be on a log scale, and for binomial families, this will be on a logit scale.
`response`	Fitted values of the response, evaluated at the calibrated latent variables.
`vcov`	Wald-type estimated variance-covariance matrices of the calibrated latent variables or site scores. Actually, these are stored in a 3-D array whose dimension is `c(Rank(object), Rank(object), nrow(newdata))`. This type has only been implemented for `binomialff` and `poissonff` models with canonical links and `noRRR = ~ 1` and, for CQOs, `I.tolerances = TRUE` or `eq.tolerances = TRUE`.

Warning

This function is computationally expensive. Setting trace = TRUE to get a running log can be a good idea. This function has been tested but not extensively.

Note

Despite the name of this function, CAO models are handled as well to a certain extent. Some combinations of parameters are not handled, e.g., lr.confint = TRUE only works for rank-1, type = "vcov" only works for binomialff and poissonff models with canonical links and noRRR = ~ 1, and higher-order rank models need eq.tolerances = TRUE or I.tolerances = TRUE as well. For rank-1 objects, lr.confint = TRUE is recommended above type = "vcov" in terms of accuracy and overall generality. For class "qrrvglm" objects it is necessary that all response' tolerance matrices are positive-definite which correspond to bell-shaped response curves/surfaces.

For binomialff and poissonff models the deviance slot is used for the optimization rather than the loglikelihood slot, therefore one can calibrate using real-valued responses. (If the loglikelihood slot were used then functions such as dpois would be used with log = TRUE and then would be restricted to feed in integer-valued response values.)

Maximum likelihood calibration for Gaussian logit regression models may be performed by rioja but this applies to a single environmental variable such as pH in data("SWAP", package = "rioja"). In VGAM calibrate() estimates values of the latent variable rather than individual explanatory variables, hence the setting is more on ordination.

Author(s)

T. W. Yee. Recent work on the standard errors by David Zucker and Sam Oman at HUJI is gratefully acknowledged—these are returned in the vcov component and provided inspiration for lr.confint and cf.confint. A joint publication is being prepared on this subject.

References

Abate, J. and Whitt, W. (1992). The Fourier-series method for inverting transforms of probability distributions. Queueing Systems, 10, 5–88.

ter Braak, C. J. F. (1995). Calibration. In: Data Analysis in Community and Landscape Ecology by Jongman, R. H. G., ter Braak, C. J. F. and van Tongeren, O. F. R. (Eds.) Cambridge University Press, Cambridge.

Examples

## Not run: 
hspider[, 1:6] <- scale(hspider[, 1:6])  # Stdze environmental variables
set.seed(123)
siteNos <- c(1, 5)  # Calibrate these sites
pet1 <- cqo(cbind(Pardlugu, Pardmont, Pardnigr, Pardpull, Zoraspin) ~
        WaterCon + BareSand + FallTwig + CoveMoss + CoveHerb + ReflLux,
        trace = FALSE,
        data = hspider[-siteNos, ],  # Sites not in fitted model
        family = poissonff, I.toler = TRUE, Crow1positive = TRUE)
y0 <- hspider[siteNos, colnames(depvar(pet1))]  # Species counts
(cpet1 <- calibrate(pet1, trace = TRUE, newdata = data.frame(y0)))
(clrpet1 <- calibrate(pet1, lr.confint = TRUE, newdata = data.frame(y0)))
(ccfpet1 <- calibrate(pet1, cf.confint = TRUE, newdata = data.frame(y0)))
(cp1wald <- calibrate(pet1, newdata = y0, type = "everything"))

## End(Not run)

## Not run: 
# Graphically compare the actual site scores with their calibrated
# values. 95 percent likelihood-based confidence intervals in green.
persp(pet1, main = "Site scores: solid=actual, dashed=calibrated",
      label = TRUE, col = "gray50", las = 1)
# Actual site scores:
xvars <- rownames(concoef(pet1))  # Variables comprising the latvar
est.latvar <- as.matrix(hspider[siteNos, xvars]) %*% concoef(pet1)
abline(v = est.latvar, col = seq(siteNos))
abline(v = cpet1, lty = 2, col = seq(siteNos))  # Calibrated values
arrows(clrpet1[,  3], c(60, 60), clrpet1[,  4], c(60, 60),  # Add CIs
       length = 0.08, col = "orange", angle = 90, code = 3, lwd = 2)
arrows(ccfpet1[,  3], c(70, 70), ccfpet1[,  4], c(70, 70),  # Add CIs
       length = 0.08, col = "limegreen", angle = 90, code = 3, lwd = 2)
arrows(cp1wald$latvar - 1.96 * sqrt(cp1wald$vcov), c(65, 65),
       cp1wald$latvar + 1.96 * sqrt(cp1wald$vcov), c(65, 65),  # Wald CIs
       length = 0.08, col = "blue", angle = 90, code = 3, lwd = 2)
legend("topright", lwd = 2,
       leg = c("CF interval", "Wald  interval", "LR interval"),
       col = c("limegreen", "blue", "orange"), lty = 1)

## End(Not run)

[Package VGAM version 1.1-11 Index]