cAIC {cAIC4}  R Documentation 
Estimates the conditional Akaike information for models that were fitted in
'lme4' or with 'lme'. Currently all distributions are supported for 'lme4' models,
based on parametric conditional bootstrap.
For the Gaussian distribution (from a lmer
or lme
call) and the Poisson distribution analytical estimators for the degrees of
freedom are available, based on Stein type formulas. Also the conditional
Akaike information for generalized additive models based on a fit via the
'gamm4' or gamm
calls from the 'mgcv' package can be estimated.
A handson tutorial for the package can be found at https://arxiv.org/abs/1803.05664.
cAIC(object, method = NULL, B = NULL, sigma.penalty = 1, analytic = TRUE)
object 
An object of class merMod either fitted by

method 
Either 
B 
Number of Bootstrap replications. The default is 
sigma.penalty 
An integer value for additional penalization in the analytic
Gaussian calculation to account for estimated variance components in the residual (co)variance.
Per default 
analytic 
FALSE if the numeric hessian of the (restricted) marginal loglikelihood from the lmer optimization procedure should be used. Otherwise (default) TRUE, i.e. use a analytical version that has to be computed. Only used for the analytical version of Gaussian responses. 
For method = "steinian"
and an object of class merMod
computed
the analytic representation of the corrected conditional AIC in Greven and
Kneib (2010). This is based on a the Stein formula and uses implicit
differentiation to calculate the derivative of the random effects covariance
parameters w.r.t. the data. The code is adapted form the one provided in
the supplementary material of the paper by Greven and Kneib (2010). The
supplied merMod
model needs to be checked if a random
effects covariance parameter has an optimum on the boundary, i.e. is zero.
And if so the model needs to be refitted with the according random effect
terms omitted. This is also done by the function and the refitted model is
also returned. Notice that the boundary.tol
argument in
lmerControl
has an impact on whether a parameter is
estimated to lie on the boundary of the parameter space. For estimated error
variance the degrees of freedom are increased by one per default.
sigma.penalty
can be set manually for merMod
models
if no (0) or more than one variance component (>1) has been estimated. For
lme
objects this value is automatically defined.
If the object is of class merMod
and has family =
"poisson"
there is also an analytic representation of the conditional AIC
based on the ChenStein formula, see for instance Saefken et. al (2014). For
the calculation the model needs to be refitted for each observed response
variable minus the number of response variables that are exactly zero. The
calculation therefore takes longer then for models with Gaussian responses.
Due to the speed and stability of 'lme4' this is still possible, also for
larger datasets.
If the model has Bernoulli distributed responses and method =
"steinian"
, cAIC
calculates the degrees of freedom based on a
proposed estimator by Efron (2004). This estimator is asymptotically
unbiased if the estimated conditional mean is consistent. The calculation
needs as many model refits as there are data points.
Another more general method for the estimation of the degrees of freedom is the conditional bootstrap. This is proposed in Efron (2004). For the B boostrap samples the degrees of freedom are estimated by
\frac{1}{B  1}∑_{i=1}^nθ_i(z_i)(z_i\bar{z}),
where θ_i(z_i) is the ith element of the estimated natural parameter.
For models with no random effects, i.e. (g)lms, the cAIC
function returns the AIC of the model with scale parameter estimated by REML.
A cAIC
object, which is a list consisting of:
1. the conditional log likelihood, i.e. the log likelihood with the random
effects as penalized parameters; 2. the estimated degrees of freedom;
3. a list element that is either NULL
if no new model was fitted otherwise the new (reduced) model, see details;
4. a boolean variable indicating whether a new model was fitted or not; 5.
the estimator of the conditional Akaike information, i.e. minus twice the
log likelihood plus twice the degrees of freedom.
Currently the cAIC can only be estimated for
family
equal to "gaussian"
, "poisson"
and
"binomial"
. Neither negative binomial nor gamma distributed responses
are available.
Weighted Gaussian models are not yet implemented.
Benjamin Saefken, David Ruegamer
Saefken, B., Ruegamer, D., Kneib, T. and Greven, S. (2018): Conditional Model Selection in MixedEffects Models with cAIC4. https://arxiv.org/abs/1803.05664
Saefken, B., Kneib T., van Waveren C.S. and Greven, S. (2014) A unifying approach to the estimation of the conditional Akaike information in generalized linear mixed models. Electronic Journal Statistics Vol. 8, 201225.
Greven, S. and Kneib T. (2010) On the behaviour of marginal and conditional AIC in linear mixed models. Biometrika 97(4), 773789.
Efron , B. (2004) The estimation of prediction error. J. Amer. Statist. Ass. 99(467), 619632.
### Three application examples b < lmer(Reaction ~ Days + (Days  Subject), sleepstudy) cAIC(b) b2 < lmer(Reaction ~ (1  Days) + (1  Subject), sleepstudy) cAIC(b2) b2ML < lmer(Reaction ~ (1 + Days  Subject), sleepstudy, REML = FALSE) cAIC(b2ML) ### Demonstration of boundary case ## Not run: set.seed(201711) n < 50 beta < 2 x < rnorm(n) eta < x*beta id < gl(5,10) epsvar < 1 data < data.frame(x = x, id = id) y_wo_bi < eta + rnorm(n, 0, sd = epsvar) # use a very small RE variance ranvar < 0.05 nrExperiments < 100 sim < sapply(1:nrExperiments, function(j){ b_i < scale(rnorm(5, 0, ranvar), scale = FALSE) y < y_wo_bi + model.matrix(~ 1 + id) %*% b_i data$y < y mixedmod < lmer(y ~ x + (1  id), data = data) linmod < lm(y ~ x, data = data) c(cAIC(mixedmod)$caic, cAIC(linmod)$caic) }) rownames(sim) < c("mixed model", "linear model") boxplot(t(sim)) ## End(Not run)