R: Half-Normal Plots with Simulation Envelopes

hnp {hnp}

R Documentation

Half-Normal Plots with Simulation Envelopes

Description

Produces a (half-)normal plot from a fitted model object for a range of different models. Extendable to non-implemented model classes.

Usage

hnp(object, sim = 99, conf = 0.95, resid.type, maxit,
    halfnormal = T, scale = F, plot.sim = T, verb.sim = F,
    warn = F, how.many.out = F, print.on = F, paint.out = F,
    col.paint.out, newclass = F, diagfun, simfun, fitfun, ...)

Arguments

`object`	fitted model object or numeric vector.
`sim`	number of simulations used to compute envelope. Default is 99.
`conf`	confidence level of the simulated envelope. Default is 0.95.
`resid.type`	type of residuals to be used; must be one of "deviance", "pearson", "response", "working", "simple", "student", or "standard". Not all model type and residual type combinations are allowed. Defaults are "student" for `aov` and `lm` objects, "deviance" for `glm`, `glm.nb`, `lmer`, `glmer` and `aodml` objects, "simple" for `gamlss` objects, "response" for `glmmadmb` and `vglm` objects, "pearson" for `zeroinfl` and `hurdle` objects.
`maxit`	maximum number of iterations of the estimation algorithm. Defaults are 25 for `glm`, `glm.nb`, `gamlss` and `vglm` objects, 300 for `glmmadmb`, `lmer` and `glmer` objects, 3000 for `aodml` objects, 10000 for `zeroinfl` and `hurdle` objects.
`halfnormal`	logical. If `TRUE`, a half-normal plot is produced. If `FALSE`, a normal plot is produced. Default is `TRUE`.
`scale`	logical. If `TRUE` and if `object` is a numeric vector, simulates from a normal distribution with mean and variance estimated from `object`. If `FALSE`, uses a standard normal distribution to simulate from. Default is `FALSE`.
`plot.sim`	logical. Should the (half-)normal plot be plotted? Default is `TRUE`.
`verb.sim`	logical. If `TRUE`, prints each step of the simulation procedure. Default is `FALSE`.
`warn`	logical. If `TRUE`, shows warning messages in the simulation process. Default is `FALSE`.
`how.many.out`	logical. If `TRUE`, the number of points out of the envelope is printed. Default is `FALSE`.
`print.on`	logical. If `TRUE`, the number of points out of the envelope is printed on the plot. Default is `FALSE`.
`paint.out`	logical. If `TRUE`, points out of the simulation envelope are plotted in a different color. Default is `FALSE`.
`col.paint.out`	If `paint.out=TRUE`, sets the color of points out of the envelope. Default is `"red"`.
`newclass`	logical. If `TRUE`, use `diagfun`, `simfun`, and `fitfun` to extract diagnostics (typically residuals), generate simulated data using fitted model parameters, and fit the desired model. Default is `FALSE`.
`diagfun`	user-defined function used to obtain the diagnostic measures from the fitted model object (only used when `newclass=TRUE`). Default is `resid`.
`simfun`	user-defined function used to simulate a random sample from the model estimated parameters (only used when `newclass=TRUE`).
`fitfun`	user-defined function used to re-fit the model to simulated data (only used when `newclass=TRUE`).
`...`	extra graphical arguments passed to `plot.hnp`.

Details

A relatively easy way to assess goodness-of-fit of a fitted model is to use (half-)normal plots of a model diagnostic, e.g., different types of residuals, Cook's distance, leverage. These plots are obtained by plotting the ordered absolute values of a model diagnostic versus the expected order statistic of a half-normal distribution,

\Phi^{-1}(\frac{i+n-1/8}{2*n+1/2})

(for a half-normal plot) or the normal distribution,

\Phi^{-1}(\frac{i+3/8}{n+1/4})

(for a normal plot).

Atkinson (1985) proposed the addition of a simulated envelope, which is such that under the correct model the plot for the observed data is likely to fall within the envelope. The objective is not to provide a region of acceptance, but some sort of guidance to what kind of shape to expect.

Obtaining the simulated envelope is simple and consists of (1) fitting a model; (2) extracting model diagnostics and calculating sorted absolute values; (3) simulating 99 (or more) response variables using the same model matrix, error distribution and fitted parameters; (4) fitting the same model to each simulated response variable and obtaining the same model diagnostics, again sorted absolute values; (5) computing the desired percentiles (e.g., 2.5 and 97.5) at each value of the expected order statistic to form the envelope.

This function handles different model classes and more will be implemented as time goes by. So far, the following models are included:

Continuous data:

Normal:	functions `lm`, `aov` and `glm` with `family=gaussian`

Gamma:	function `glm` with `family=Gamma`

Inverse gaussian:	function `glm` with `family=inverse.gaussian`

Proportion data:

Binomial:	function `glm` with `family=binomial`

Quasi-binomial:	function `glm` with `family=quasibinomial`

Beta-binomial:	package `VGAM` - function `vglm`, with `family=betabinomial`;
	package `aods3` - function `aodml`, with `family="bb"`;
	package `gamlss` - function `gamlss`, with `family=BB`;
	package `glmmADMB` - function `glmmadmb`, with `family="betabinomial"`

Zero-inflated binomial:	package `VGAM` - function `vglm`, with `family=zibinomial`;
	package `gamlss` - function `gamlss`, with `family=ZIBI`;
	package `glmmADMB` - function `glmmadmb`, with `family="binomial"`
	and `zeroInfl=TRUE`

Zero-inflated beta-binomial:	package `gamlss` - function `gamlss`, with `family=ZIBB`;
	package `glmmADMB` - function `glmmadmb`, with `family="betabinomial"`
	and `zeroInfl=TRUE`

Multinomial:	package `nnet` - function `multinom`

Count data:

Poisson:	function `glm` with `family=poisson`

Quasi-Poisson:	function `glm` with `family=quasipoisson`

Negative binomial:	package `MASS` - function `glm.nb`;
	package `aods3` - function `aodml`, with `family="nb"`
	and `phi.scale="inverse"`

Zero-inflated Poisson:	package `pscl` - function `zeroinfl`, with `dist="poisson"`

Zero-inflated negative binomial:	package `pscl` - function `zeroinfl`, with `dist="negbin"`

Hurdle Poisson:	package `pscl` - function `hurdle`, with `dist="poisson"`

Hurdle negative binomial:	package `pscl` - function `hurdle`, with `dist="negbin"`

Mixed models:

Linear mixed models:	package `lme4`, function `lmer`

Generalized linear mixed models:	package `lme4`, function `glmer` with `family=poisson` or `binomial`

Users can also use a numeric vector as object and hnp will generate the (half-)normal plot with a simulated envelope using the standard normal distribution (scale=F) or N(\mu, \sigma^2) (scale=T).

Implementing a new model class is done by providing three functions to hnp: diagfun - to obtain model diagnostics, simfun - to simulate random variables and fitfun - to refit the model to simulated variables. The way these functions must be written is shown in the Examples section.

Value

hnp returns an object of class "hnp", which is a list containing the following components:

`x`	quantiles of the (half-)normal distribution
`lower`	lower envelope band
`median`	median envelope band
`upper`	upper envelope band
`residuals`	diagnostic measures in absolute value and in order
`out.index`	vector indicating which points are out of the envelope
`col.paint.out`	color of points which are outside of the envelope (used if `paint.out=TRUE`)
`how.many.out`	logical. Equals `TRUE` if `how.many.out=TRUE` in the `hnp` call
`total`	length of the diagnostic measure vector
`out`	number of points out of the envelope
`print.on`	logical. Equals `TRUE` if `print.on=TRUE` in the `hnp` call
`paint.out`	logical. Equals `TRUE` if `paint.out=TRUE` in the `hnp` call
`all.sim`	matrix with all diagnostics obtained in the simulations. Each column represents one simulation

Note

See documentation on example data sets for simple analyses and goodness-of-fit checking using hnp.

Author(s)

Rafael A. Moral <rafael_moral@yahoo.com.br>, John Hinde and Clarice G. B. Demétrio

References

Moral, R. A., Hinde, J. and Demétrio, C. G. B. (2017) Half-normal plots and overdispersed models in R: the hnp package. Journal of Statistical Software 81(10):1-23.

Atkinson, A. C. (1985) Plots, transformations and regression, Clarendon Press, Oxford.

Demétrio, C. G. B. and Hinde, J. (1997) Half-normal plots and overdispersion. GLIM Newsletter 27:19-26.

Hinde, J. and Demétrio, C. G. B. (1998) Overdispersion: models and estimation. Computational Statistics and Data Analysis 27:151-170.

Demétrio, C. G. B., Hinde, J. and Moral, R. A. (2014) Models for overdispersed data in entomology. In Godoy, W. A. C. and Ferreira, C. P. (Eds.) Ecological modelling applied to entomology. Springer.

Examples

## Simple Poisson regression
set.seed(100)
counts <- c(rpois(5, 2), rpois(5, 4), rpois(5, 6), rpois(5, 8))
treatment <- gl(4, 5)
fit <- glm(counts ~ treatment, family=poisson)
anova(fit, test="Chisq")

## half-normal plot
hnp(fit)

## or save it in an object and then use the plot method
my.hnp <- hnp(fit, print.on=TRUE, plot=FALSE)
plot(my.hnp)

## changing graphical parameters
plot(my.hnp, lty=2, pch=4, cex=1.2)
plot(my.hnp, lty=c(2,3,2), pch=4, cex=1.2, col=c(2,2,2,1))
plot(my.hnp, main="Half-normal plot", xlab="Half-normal scores",
     ylab="Deviance residuals", legpos="bottomright")

## Using a numeric vector
my.vec <- rnorm(20, 4, 4)
hnp(my.vec) # using N(0,1)
hnp(my.vec, scale=TRUE) # using N(mu, sigma^2)

## Implementing new classes
## Users provide three functions - diagfun, simfun and fitfun,
## in the following way:
##
## diagfun <- function(obj) {
##   userfunction(obj, other_argumens)
##     # e.g., resid(obj, type="pearson")
##   }
##
## simfun <- function(n, obj) {
##   userfunction(n, other_arguments) # e.g., rpois(n, fitted(obj))
##   }
##
## fitfun <- function(y.) {
##  userfunction(y. ~ linear_predictor, other_arguments, data=data)
##    # e.g., glm(y. ~ block + factor1 * factor2, family=poisson,
##    #           data=mydata)
##  }
##
## when response is binary:
## fitfun <- function(y.) {
##  userfunction(cbind(y., m-y.) ~ linear_predictor,
##               other_arguments, data=data)
##    #e.g., glm(cbind(y., m-y.) ~ treatment - 1,
##    #          family=binomial, data=data)
##  }

## Not run: 
## Example no. 1: Using Cook's distance as a diagnostic measure
y <- rpois(30, lambda=rep(c(.5, 1.5, 5), each=10))
tr <- gl(3, 10)
fit1 <- glm(y ~ tr, family=poisson)

# diagfun
d.fun <- function(obj) cooks.distance(obj)

# simfun
s.fun <- function(n, obj) {
  lam <- fitted(obj)
  rpois(n, lambda=lam)
}

# fitfun
my.data <- data.frame(y, tr)
f.fun <- function(y.) glm(y. ~ tr, family=poisson, data=my.data)

# hnp call
hnp(fit1, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun)

## Example no. 2: Implementing gamma model using package gamlss
# load package
require(gamlss)

# model fitting
y <- rGA(30, mu=rep(c(.5, 1.5, 5), each=10), sigma=.5)
tr <- gl(3, 10)
fit2 <- gamlss(y ~ tr, family=GA)

# diagfun
d.fun <- function(obj) resid(obj) # this is the default if no
                                  # diagfun is provided

# simfun
s.fun <- function(n, obj) {
  mu <- obj$mu.fv
  sig <- obj$sigma.fv
  rGA(n, mu=mu, sigma=sig)
}

# fitfun
my.data <- data.frame(y, tr)
f.fun <- function(y.) gamlss(y. ~ tr, family=GA, data=my.data)

# hnp call
hnp(fit2, newclass=TRUE, diagfun=d.fun, simfun=s.fun,
    fitfun=f.fun, data=data.frame(y, tr))

## Example no. 3: Implementing binomial model in gamlss
# model fitting
y <- rBI(30, bd=50, mu=rep(c(.2, .5, .9), each=10))
m <- 50
tr <- gl(3, 10)
fit3 <- gamlss(cbind(y, m-y) ~ tr, family=BI)

# diagfun
d.fun <- function(obj) resid(obj)

# simfun
s.fun <- function(n, obj) {
  mu <- obj$mu.fv
  bd <- obj$bd
  rBI(n, bd=bd, mu=mu)
}

# fitfun
my.data <- data.frame(y, tr, m)
f.fun <- function(y.) gamlss(cbind(y., m-y.) ~ tr,
                               family=BI, data=my.data)

# hnp call
hnp(fit3, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun)

## End(Not run)

[Package hnp version 1.2-6 Index]