testDispersion {DHARMa} | R Documentation |

This function performs simulation-based tests for over/underdispersion. If type = "DHARMa" (default and recommended), simulation-based dispersion tests are performed. Their behavior differs depending on whether simulations are done with refit = F, or refit = T, and whether data is simulated conditional (e.g. re.form ~0 in lme4) (see below). If type = "PearsonChisq", a chi2 test on Pearson residuals is performed.

testDispersion(simulationOutput, alternative = c("two.sided", "greater", "less"), plot = T, type = c("DHARMa", "PearsonChisq"), ...)

`simulationOutput` |
an object of class DHARMa, either created via |

`alternative` |
a character string specifying whether the test should test if observations are "greater", "less" or "two.sided" compared to the simulated null hypothesis. Greater corresponds to testing only for overdispersion. It is recommended to keep the default setting (testing for both over and underdispersion) |

`plot` |
whether to provide a plot for the results |

`type` |
which test to run. Default is DHARMa, other options are PearsonChisq (see details) |

`...` |
arguments to pass on to |

Over / underdispersion means that the observed data is more / less dispersed than expected under the fitted model. There are a number of common ways to test for dispersion problems, from the classical dispersion/df idea used for GLMs over other tests implemented in various R packages. This function implements several dispersion tests.

type == "DHARMa"

If type = "DHARMa" (default and recommended), simulation-based dispersion tests are performed. Their behavior differs depending on whether simulations are done with refit = F, or refit = T, and whether data is simulated conditional (e.g. re.form ~0 in lme4)

If refit = F, the function uses `testGeneric`

to compare the variance of the observed residuals against the variance of the simulated residuals via their ratios. The test returns the ratio of observed vs. mean simulated variance, together with a p-value based on the distribution of the simulated sds. A significant ratio > 1 indicates overdispersion, a significant ratio < 1 underdispersion.

If refit = T, the function compares the approximate deviance (via squared pearson residuals) with the same quantity from the models refitted with simulated data. Applying this is much slower than the previous alternative. Given the computational cost, I would suggest that most users will be satisfied with the standard dispersion test.

Moreover, for either refit = T or F, the results of the DHARMa dispersion test will differ depending on whether simulations are done conditional (= conditional on fitted random effects) or unconditional (= REs are re-simulated). You can change between conditional or unconditional simulations in `simulateResiduals`

if this is supported by the regression package that you use (depends on the package). The default in DHARMa is to use unconditional simulations (for other reasons), but conditional simulations are often more sensitive to dispersion problems in the presence of substantial RE variance, and I recommend checking dispersion with conditional simulations if supported by the used regression package.

type == "PearsonChisq"

This is the test described in https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#overdispersion, identical to performance::check_overdispersion. Works only if the fitted model provides df.residual and Pearson residuals.

The test statistics is biased to lower values under quite general conditions, and will therefore tend to test significant for underdispersion. It is recommended to use this test only for overdispersion, i.e. use alternative == "greater"

For particular model classes / situations, there may be more powerful and thus preferable over the DHARMa test. The advantage of the DHARMa test is that it directly targets the spread of the data (unless other tests such as dispersion/df, which essentially measure fit and may thus be triggered by problems other than dispersion as well), and it makes practically no assumptions about the fitted model, other than the availability of simulations.

Florian Hartig

`testResiduals`

, `testUniformity`

, `testOutliers`

, `testDispersion`

, `testZeroInflation`

, `testGeneric`

, `testTemporalAutocorrelation`

, `testSpatialAutocorrelation`

, `testQuantiles`

, `testCategorical`

library(lme4) set.seed(123) testData = createData(sampleSize = 100, overdispersion = 0.5, randomEffectVariance = 1) fittedModel <- glmer(observedResponse ~ Environment1 + (1|group), family = "poisson", data = testData) simulationOutput <- simulateResiduals(fittedModel = fittedModel) # default DHARMa dispersion test - simulation-based testDispersion(simulationOutput) testDispersion(simulationOutput, alternative = "less", plot = FALSE) # only underdispersion testDispersion(simulationOutput, alternative = "greater", plot = FALSE) # only oversispersion # for mixed models, the test is usually more powerful if residuals are calculated # conditional on fitted REs simulationOutput <- simulateResiduals(fittedModel = fittedModel, re.form = NULL) testDispersion(simulationOutput) # DHARMa also implements the popular Pearson-chisq test that is also on the glmmWiki by Ben Bolker # The issue with this test is that it requires the df of the model, which are not well defined # for GLMMs. It is biased towards underdispersion, with bias getting larger with the number # of RE groups. In doubt, only test for overdispersion testDispersion(simulationOutput, type = "PearsonChisq", alternative = "greater") # if refit = T, a different test on simulated Pearson residuals will calculated (see help) simulationOutput2 <- simulateResiduals(fittedModel = fittedModel, refit = TRUE, seed = 12, n = 20) testDispersion(simulationOutput2) # often useful to test dispersion per group (in particular for binomial data, see vignette) simulationOutputAggregated = recalculateResiduals(simulationOutput2, group = testData$group) testDispersion(simulationOutputAggregated)

[Package *DHARMa* version 0.4.3 Index]