hnp {hnp}R Documentation

Half-Normal Plots with Simulation Envelopes

Description

Produces a (half-)normal plot from a fitted model object for a range of different models. Extendable to non-implemented model classes.

Usage

hnp(object, sim = 99, conf = 0.95, resid.type, maxit,
    halfnormal = T, scale = F, plot.sim = T, verb.sim = F,
    warn = F, how.many.out = F, print.on = F, paint.out = F,
    col.paint.out, newclass = F, diagfun, simfun, fitfun, ...)

Arguments

object

fitted model object or numeric vector.

sim

number of simulations used to compute envelope. Default is 99.

conf

confidence level of the simulated envelope. Default is 0.95.

resid.type

type of residuals to be used; must be one of "deviance", "pearson", "response", "working", "simple", "student", or "standard". Not all model type and residual type combinations are allowed. Defaults are "student" for aov and lm objects, "deviance" for glm, glm.nb, lmer, glmer and aodml objects, "simple" for gamlss objects, "response" for glmmadmb and vglm objects, "pearson" for zeroinfl and hurdle objects.

maxit

maximum number of iterations of the estimation algorithm. Defaults are 25 for glm, glm.nb, gamlss and vglm objects, 300 for glmmadmb, lmer and glmer objects, 3000 for aodml objects, 10000 for zeroinfl and hurdle objects.

halfnormal

logical. If TRUE, a half-normal plot is produced. If FALSE, a normal plot is produced. Default is TRUE.

scale

logical. If TRUE and if object is a numeric vector, simulates from a normal distribution with mean and variance estimated from object. If FALSE, uses a standard normal distribution to simulate from. Default is FALSE.

plot.sim

logical. Should the (half-)normal plot be plotted? Default is TRUE.

verb.sim

logical. If TRUE, prints each step of the simulation procedure. Default is FALSE.

warn

logical. If TRUE, shows warning messages in the simulation process. Default is FALSE.

how.many.out

logical. If TRUE, the number of points out of the envelope is printed. Default is FALSE.

print.on

logical. If TRUE, the number of points out of the envelope is printed on the plot. Default is FALSE.

paint.out

logical. If TRUE, points out of the simulation envelope are plotted in a different color. Default is FALSE.

col.paint.out

If paint.out=TRUE, sets the color of points out of the envelope. Default is "red".

newclass

logical. If TRUE, use diagfun, simfun, and fitfun to extract diagnostics (typically residuals), generate simulated data using fitted model parameters, and fit the desired model. Default is FALSE.

diagfun

user-defined function used to obtain the diagnostic measures from the fitted model object (only used when newclass=TRUE). Default is resid.

simfun

user-defined function used to simulate a random sample from the model estimated parameters (only used when newclass=TRUE).

fitfun

user-defined function used to re-fit the model to simulated data (only used when newclass=TRUE).

...

extra graphical arguments passed to plot.hnp.

Details

A relatively easy way to assess goodness-of-fit of a fitted model is to use (half-)normal plots of a model diagnostic, e.g., different types of residuals, Cook's distance, leverage. These plots are obtained by plotting the ordered absolute values of a model diagnostic versus the expected order statistic of a half-normal distribution,

\Phi^{-1}(\frac{i+n-1/8}{2*n+1/2})

(for a half-normal plot) or the normal distribution,

\Phi^{-1}(\frac{i+3/8}{n+1/4})

(for a normal plot).

Atkinson (1985) proposed the addition of a simulated envelope, which is such that under the correct model the plot for the observed data is likely to fall within the envelope. The objective is not to provide a region of acceptance, but some sort of guidance to what kind of shape to expect.

Obtaining the simulated envelope is simple and consists of (1) fitting a model; (2) extracting model diagnostics and calculating sorted absolute values; (3) simulating 99 (or more) response variables using the same model matrix, error distribution and fitted parameters; (4) fitting the same model to each simulated response variable and obtaining the same model diagnostics, again sorted absolute values; (5) computing the desired percentiles (e.g., 2.5 and 97.5) at each value of the expected order statistic to form the envelope.

This function handles different model classes and more will be implemented as time goes by. So far, the following models are included:

Continuous data:
Normal: functions lm, aov and glm with family=gaussian
Gamma: function glm with family=Gamma
Inverse gaussian: function glm with family=inverse.gaussian
Proportion data:
Binomial: function glm with family=binomial
Quasi-binomial: function glm with family=quasibinomial
Beta-binomial: package VGAM - function vglm, with family=betabinomial;
package aods3 - function aodml, with family="bb";
package gamlss - function gamlss, with family=BB;
package glmmADMB - function glmmadmb, with family="betabinomial"
Zero-inflated binomial: package VGAM - function vglm, with family=zibinomial;
package gamlss - function gamlss, with family=ZIBI;
package glmmADMB - function glmmadmb, with family="binomial"
and zeroInfl=TRUE
Zero-inflated beta-binomial: package gamlss - function gamlss, with family=ZIBB;
package glmmADMB - function glmmadmb, with family="betabinomial"
and zeroInfl=TRUE
Multinomial: package nnet - function multinom
Count data:
Poisson: function glm with family=poisson
Quasi-Poisson: function glm with family=quasipoisson
Negative binomial: package MASS - function glm.nb;
package aods3 - function aodml, with family="nb"
and phi.scale="inverse"
Zero-inflated Poisson: package pscl - function zeroinfl, with dist="poisson"
Zero-inflated negative binomial: package pscl - function zeroinfl, with dist="negbin"
Hurdle Poisson: package pscl - function hurdle, with dist="poisson"
Hurdle negative binomial: package pscl - function hurdle, with dist="negbin"
Mixed models:
Linear mixed models: package lme4, function lmer
Generalized linear mixed models: package lme4, function glmer with family=poisson or binomial

Users can also use a numeric vector as object and hnp will generate the (half-)normal plot with a simulated envelope using the standard normal distribution (scale=F) or N(\mu, \sigma^2) (scale=T).

Implementing a new model class is done by providing three functions to hnp: diagfun - to obtain model diagnostics, simfun - to simulate random variables and fitfun - to refit the model to simulated variables. The way these functions must be written is shown in the Examples section.

Value

hnp returns an object of class "hnp", which is a list containing the following components:

x

quantiles of the (half-)normal distribution

lower

lower envelope band

median

median envelope band

upper

upper envelope band

residuals

diagnostic measures in absolute value and in order

out.index

vector indicating which points are out of the envelope

col.paint.out

color of points which are outside of the envelope (used if paint.out=TRUE)

how.many.out

logical. Equals TRUE if how.many.out=TRUE in the hnp call

total

length of the diagnostic measure vector

out

number of points out of the envelope

print.on

logical. Equals TRUE if print.on=TRUE in the hnp call

paint.out

logical. Equals TRUE if paint.out=TRUE in the hnp call

all.sim

matrix with all diagnostics obtained in the simulations. Each column represents one simulation

Note

See documentation on example data sets for simple analyses and goodness-of-fit checking using hnp.

Author(s)

Rafael A. Moral <rafael_moral@yahoo.com.br>, John Hinde and Clarice G. B. Demétrio

References

Moral, R. A., Hinde, J. and Demétrio, C. G. B. (2017) Half-normal plots and overdispersed models in R: the hnp package. Journal of Statistical Software 81(10):1-23.

Atkinson, A. C. (1985) Plots, transformations and regression, Clarendon Press, Oxford.

Demétrio, C. G. B. and Hinde, J. (1997) Half-normal plots and overdispersion. GLIM Newsletter 27:19-26.

Hinde, J. and Demétrio, C. G. B. (1998) Overdispersion: models and estimation. Computational Statistics and Data Analysis 27:151-170.

Demétrio, C. G. B., Hinde, J. and Moral, R. A. (2014) Models for overdispersed data in entomology. In Godoy, W. A. C. and Ferreira, C. P. (Eds.) Ecological modelling applied to entomology. Springer.

See Also

plot.hnp, cbb, chryso, corn, fungi, oil, progeny, wolbachia

Examples

## Simple Poisson regression
set.seed(100)
counts <- c(rpois(5, 2), rpois(5, 4), rpois(5, 6), rpois(5, 8))
treatment <- gl(4, 5)
fit <- glm(counts ~ treatment, family=poisson)
anova(fit, test="Chisq")

## half-normal plot
hnp(fit)

## or save it in an object and then use the plot method
my.hnp <- hnp(fit, print.on=TRUE, plot=FALSE)
plot(my.hnp)

## changing graphical parameters
plot(my.hnp, lty=2, pch=4, cex=1.2)
plot(my.hnp, lty=c(2,3,2), pch=4, cex=1.2, col=c(2,2,2,1))
plot(my.hnp, main="Half-normal plot", xlab="Half-normal scores",
     ylab="Deviance residuals", legpos="bottomright")

## Using a numeric vector
my.vec <- rnorm(20, 4, 4)
hnp(my.vec) # using N(0,1)
hnp(my.vec, scale=TRUE) # using N(mu, sigma^2)

## Implementing new classes
## Users provide three functions - diagfun, simfun and fitfun,
## in the following way:
##
## diagfun <- function(obj) {
##   userfunction(obj, other_argumens)
##     # e.g., resid(obj, type="pearson")
##   }
##
## simfun <- function(n, obj) {
##   userfunction(n, other_arguments) # e.g., rpois(n, fitted(obj))
##   }
##
## fitfun <- function(y.) {
##  userfunction(y. ~ linear_predictor, other_arguments, data=data)
##    # e.g., glm(y. ~ block + factor1 * factor2, family=poisson,
##    #           data=mydata)
##  }
##
## when response is binary:
## fitfun <- function(y.) {
##  userfunction(cbind(y., m-y.) ~ linear_predictor,
##               other_arguments, data=data)
##    #e.g., glm(cbind(y., m-y.) ~ treatment - 1,
##    #          family=binomial, data=data)
##  }

## Not run: 
## Example no. 1: Using Cook's distance as a diagnostic measure
y <- rpois(30, lambda=rep(c(.5, 1.5, 5), each=10))
tr <- gl(3, 10)
fit1 <- glm(y ~ tr, family=poisson)

# diagfun
d.fun <- function(obj) cooks.distance(obj)

# simfun
s.fun <- function(n, obj) {
  lam <- fitted(obj)
  rpois(n, lambda=lam)
}

# fitfun
my.data <- data.frame(y, tr)
f.fun <- function(y.) glm(y. ~ tr, family=poisson, data=my.data)

# hnp call
hnp(fit1, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun)

## Example no. 2: Implementing gamma model using package gamlss
# load package
require(gamlss)

# model fitting
y <- rGA(30, mu=rep(c(.5, 1.5, 5), each=10), sigma=.5)
tr <- gl(3, 10)
fit2 <- gamlss(y ~ tr, family=GA)

# diagfun
d.fun <- function(obj) resid(obj) # this is the default if no
                                  # diagfun is provided

# simfun
s.fun <- function(n, obj) {
  mu <- obj$mu.fv
  sig <- obj$sigma.fv
  rGA(n, mu=mu, sigma=sig)
}

# fitfun
my.data <- data.frame(y, tr)
f.fun <- function(y.) gamlss(y. ~ tr, family=GA, data=my.data)

# hnp call
hnp(fit2, newclass=TRUE, diagfun=d.fun, simfun=s.fun,
    fitfun=f.fun, data=data.frame(y, tr))

## Example no. 3: Implementing binomial model in gamlss
# model fitting
y <- rBI(30, bd=50, mu=rep(c(.2, .5, .9), each=10))
m <- 50
tr <- gl(3, 10)
fit3 <- gamlss(cbind(y, m-y) ~ tr, family=BI)

# diagfun
d.fun <- function(obj) resid(obj)

# simfun
s.fun <- function(n, obj) {
  mu <- obj$mu.fv
  bd <- obj$bd
  rBI(n, bd=bd, mu=mu)
}

# fitfun
my.data <- data.frame(y, tr, m)
f.fun <- function(y.) gamlss(cbind(y., m-y.) ~ tr,
                               family=BI, data=my.data)

# hnp call
hnp(fit3, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun)

## End(Not run)

[Package hnp version 1.2-6 Index]