R: Conditional Akaike Information for 'lme4' and 'lme'

cAIC {cAIC4}

R Documentation

Conditional Akaike Information for 'lme4' and 'lme'

Description

Estimates the conditional Akaike information for models that were fitted in 'lme4' or with 'lme'. Currently all distributions are supported for 'lme4' models, based on parametric conditional bootstrap. For the Gaussian distribution (from a lmer or lme call) and the Poisson distribution analytical estimators for the degrees of freedom are available, based on Stein type formulas. Also the conditional Akaike information for generalized additive models based on a fit via the 'gamm4' or gamm calls from the 'mgcv' package can be estimated. A hands-on tutorial for the package can be found at https://arxiv.org/abs/1803.05664.

Usage

cAIC(object, method = NULL, B = NULL, sigma.penalty = 1, analytic = TRUE)

Arguments

`object`	An object of class merMod either fitted by `lmer` or `glmer` of the lme4-package or an `lme` object fro the nlme-package. Also objects returned form a `gamm4` call are possible.
`method`	Either `"conditionalBootstrap"` for the estimation of the degrees of freedom with the help of conditional Bootstrap or `"steinian"` for analytical representations based on Stein type formulas. The default is `NULL`. In this case the method is choosen automatically based on the `family` argument of the `(g)lmer`-object. For `"gaussian"` and `"poisson"` this is the Steinian type estimator, for all others it is the conditional Bootstrap. For models from the nlme package, only `lme` objects, i.e., with gaussian response are supported.
`B`	Number of Bootstrap replications. The default is `NULL`. Then B is the minimum of 100 and the length of the response vector.
`sigma.penalty`	An integer value for additional penalization in the analytic Gaussian calculation to account for estimated variance components in the residual (co-)variance. Per default `sigma.penalty` is equal `1`, corresponding to a diagonal error covariance matrix with only one estimated parameter (sigma). If all variance components are known, the value should be set to `0`. For individual weights (individual variances), this value should be set to the number of estimated weights. For `lme` objects the penalty term is automatically set by extracting the number of estimated variance components.
`analytic`	FALSE if the numeric hessian of the (restricted) marginal log-likelihood from the lmer optimization procedure should be used. Otherwise (default) TRUE, i.e. use a analytical version that has to be computed. Only used for the analytical version of Gaussian responses.

Details

For method = "steinian" and an object of class merMod computed the analytic representation of the corrected conditional AIC in Greven and Kneib (2010). This is based on a the Stein formula and uses implicit differentiation to calculate the derivative of the random effects covariance parameters w.r.t. the data. The code is adapted form the one provided in the supplementary material of the paper by Greven and Kneib (2010). The supplied merMod model needs to be checked if a random effects covariance parameter has an optimum on the boundary, i.e. is zero. And if so the model needs to be refitted with the according random effect terms omitted. This is also done by the function and the refitted model is also returned. Notice that the boundary.tol argument in lmerControl has an impact on whether a parameter is estimated to lie on the boundary of the parameter space. For estimated error variance the degrees of freedom are increased by one per default. sigma.penalty can be set manually for merMod models if no (0) or more than one variance component (>1) has been estimated. For lme objects this value is automatically defined.

If the object is of class merMod and has family = "poisson" there is also an analytic representation of the conditional AIC based on the Chen-Stein formula, see for instance Saefken et. al (2014). For the calculation the model needs to be refitted for each observed response variable minus the number of response variables that are exactly zero. The calculation therefore takes longer then for models with Gaussian responses. Due to the speed and stability of 'lme4' this is still possible, also for larger datasets.

If the model has Bernoulli distributed responses and method = "steinian", cAIC calculates the degrees of freedom based on a proposed estimator by Efron (2004). This estimator is asymptotically unbiased if the estimated conditional mean is consistent. The calculation needs as many model refits as there are data points.

Another more general method for the estimation of the degrees of freedom is the conditional bootstrap. This is proposed in Efron (2004). For the B boostrap samples the degrees of freedom are estimated by

\frac{1}{B - 1}\sum_{i=1}^n\theta_i(z_i)(z_i-\bar{z}),

where \theta_i(z_i) is the i-th element of the estimated natural parameter.

For models with no random effects, i.e. (g)lms, the cAIC function returns the AIC of the model with scale parameter estimated by REML.

Value

A cAIC object, which is a list consisting of: 1. the conditional log likelihood, i.e. the log likelihood with the random effects as penalized parameters; 2. the estimated degrees of freedom; 3. a list element that is either NULL if no new model was fitted otherwise the new (reduced) model, see details; 4. a boolean variable indicating whether a new model was fitted or not; 5. the estimator of the conditional Akaike information, i.e. minus twice the log likelihood plus twice the degrees of freedom.

WARNINGS

Currently the cAIC can only be estimated for family equal to "gaussian", "poisson" and "binomial". Neither negative binomial nor gamma distributed responses are available. Weighted Gaussian models are not yet implemented.

Author(s)

Benjamin Saefken, David Ruegamer

References

Saefken, B., Ruegamer, D., Kneib, T. and Greven, S. (2021): Conditional Model Selection in Mixed-Effects Models with cAIC4. <doi:10.18637/jss.v099.i08>

Saefken, B., Kneib T., van Waveren C.-S. and Greven, S. (2014) A unifying approach to the estimation of the conditional Akaike information in generalized linear mixed models. Electronic Journal Statistics Vol. 8, 201-225.

Greven, S. and Kneib T. (2010) On the behaviour of marginal and conditional AIC in linear mixed models. Biometrika 97(4), 773-789.

Efron , B. (2004) The estimation of prediction error. J. Amer. Statist. Ass. 99(467), 619-632.

Examples


### Three application examples
b <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
cAIC(b)

b2 <- lmer(Reaction ~ (1 | Days) + (1 | Subject), sleepstudy)
cAIC(b2)

b2ML <- lmer(Reaction ~ (1 + Days | Subject), sleepstudy, REML = FALSE)
cAIC(b2ML)

### Demonstration of boundary case
## Not run: 
set.seed(2017-1-1)
n <- 50
beta <- 2
x <- rnorm(n)
eta <- x*beta
id <- gl(5,10)
epsvar <- 1
data <- data.frame(x = x, id = id)
y_wo_bi <- eta + rnorm(n, 0, sd = epsvar) 

# use a very small RE variance
ranvar <- 0.05
nrExperiments <- 100

sim <- sapply(1:nrExperiments, function(j){

b_i <- scale(rnorm(5, 0, ranvar), scale = FALSE)
y <- y_wo_bi + model.matrix(~ -1 + id) %*% b_i
data$y <- y

mixedmod <- lmer(y ~ x + (1 | id), data = data)
linmod <- lm(y ~ x, data = data)

c(cAIC(mixedmod)$caic, cAIC(linmod)$caic)
})

rownames(sim) <- c("mixed model", "linear model")

boxplot(t(sim))



## End(Not run)

[Package cAIC4 version 1.0 Index]