hnp {hnp} | R Documentation |
Half-Normal Plots with Simulation Envelopes
Description
Produces a (half-)normal plot from a fitted model object for a range of different models. Extendable to non-implemented model classes.
Usage
hnp(object, sim = 99, conf = 0.95, resid.type, maxit,
halfnormal = T, scale = F, plot.sim = T, verb.sim = F,
warn = F, how.many.out = F, print.on = F, paint.out = F,
col.paint.out, newclass = F, diagfun, simfun, fitfun, ...)
Arguments
object |
fitted model object or numeric vector. |
sim |
number of simulations used to compute envelope. Default is 99. |
conf |
confidence level of the simulated envelope. Default is 0.95. |
resid.type |
type of residuals to be used; must be one of "deviance", "pearson", "response", "working", "simple", "student", or "standard". Not all model type and residual type combinations are allowed. Defaults are "student" for |
maxit |
maximum number of iterations of the estimation algorithm. Defaults are 25 for |
halfnormal |
logical. If |
scale |
logical. If |
plot.sim |
logical. Should the (half-)normal plot be plotted? Default is |
verb.sim |
logical. If |
warn |
logical. If |
how.many.out |
logical. If |
print.on |
logical. If |
paint.out |
logical. If |
col.paint.out |
If |
newclass |
logical. If |
diagfun |
user-defined function used to obtain the diagnostic measures from the fitted model object (only used when |
simfun |
user-defined function used to simulate a random sample from the model estimated parameters (only used when |
fitfun |
user-defined function used to re-fit the model to simulated data (only used when |
... |
extra graphical arguments passed to |
Details
A relatively easy way to assess goodness-of-fit of a fitted model is to use (half-)normal plots of a model diagnostic, e.g., different types of residuals, Cook's distance, leverage. These plots are obtained by plotting the ordered absolute values of a model diagnostic versus the expected order statistic of a half-normal distribution,
\Phi^{-1}(\frac{i+n-1/8}{2*n+1/2})
(for a half-normal plot) or the normal distribution,
\Phi^{-1}(\frac{i+3/8}{n+1/4})
(for a normal plot).
Atkinson (1985) proposed the addition of a simulated envelope, which is such that under the correct model the plot for the observed data is likely to fall within the envelope. The objective is not to provide a region of acceptance, but some sort of guidance to what kind of shape to expect.
Obtaining the simulated envelope is simple and consists of (1) fitting a model; (2) extracting model diagnostics and calculating sorted absolute values; (3) simulating 99 (or more) response variables using the same model matrix, error distribution and fitted parameters; (4) fitting the same model to each simulated response variable and obtaining the same model diagnostics, again sorted absolute values; (5) computing the desired percentiles (e.g., 2.5 and 97.5) at each value of the expected order statistic to form the envelope.
This function handles different model classes and more will be implemented as time goes by. So far, the following models are included:
Continuous data: | |
Normal: | functions lm , aov and glm with family=gaussian |
Gamma: | function glm with family=Gamma |
Inverse gaussian: | function glm with family=inverse.gaussian |
Proportion data: | |
Binomial: | function glm with family=binomial |
Quasi-binomial: | function glm with family=quasibinomial |
Beta-binomial: | package VGAM - function vglm , with family=betabinomial ; |
package aods3 - function aodml , with family="bb" ; |
|
package gamlss - function gamlss , with family=BB ; |
|
package glmmADMB - function glmmadmb , with family="betabinomial" |
|
Zero-inflated binomial: | package VGAM - function vglm , with family=zibinomial ; |
package gamlss - function gamlss , with family=ZIBI ; |
|
package glmmADMB - function glmmadmb , with family="binomial" |
|
and zeroInfl=TRUE |
|
Zero-inflated beta-binomial: | package gamlss - function gamlss , with family=ZIBB ; |
package glmmADMB - function glmmadmb , with family="betabinomial" |
|
and zeroInfl=TRUE |
|
Multinomial: | package nnet - function multinom |
Count data: | |
Poisson: | function glm with family=poisson |
Quasi-Poisson: | function glm with family=quasipoisson |
Negative binomial: | package MASS - function glm.nb ; |
package aods3 - function aodml , with family="nb" |
|
and phi.scale="inverse" |
|
Zero-inflated Poisson: | package pscl - function zeroinfl , with dist="poisson" |
Zero-inflated negative binomial: | package pscl - function zeroinfl , with dist="negbin" |
Hurdle Poisson: | package pscl - function hurdle , with dist="poisson" |
Hurdle negative binomial: | package pscl - function hurdle , with dist="negbin" |
Mixed models: | |
Linear mixed models: | package lme4 , function lmer |
Generalized linear mixed models: | package lme4 , function glmer with family=poisson or binomial |
Users can also use a numeric vector as object
and hnp
will generate the (half-)normal plot with a simulated envelope using the standard normal distribution (scale=F
) or N(\mu, \sigma^2)
(scale=T
).
Implementing a new model class is done by providing three functions to hnp
: diagfun
- to obtain model diagnostics, simfun
- to simulate random variables and fitfun
- to refit the model to simulated variables. The way these functions must be written is shown in the Examples section.
Value
hnp
returns an object of class "hnp"
, which is a list containing the following components:
x |
quantiles of the (half-)normal distribution |
lower |
lower envelope band |
median |
median envelope band |
upper |
upper envelope band |
residuals |
diagnostic measures in absolute value and in order |
out.index |
vector indicating which points are out of the envelope |
col.paint.out |
color of points which are outside of the envelope (used if |
how.many.out |
logical. Equals |
total |
length of the diagnostic measure vector |
out |
number of points out of the envelope |
print.on |
logical. Equals |
paint.out |
logical. Equals |
all.sim |
matrix with all diagnostics obtained in the simulations. Each column represents one simulation |
Note
See documentation on example data sets for simple analyses and goodness-of-fit checking using hnp
.
Author(s)
Rafael A. Moral <rafael_moral@yahoo.com.br>, John Hinde and Clarice G. B. Demétrio
References
Moral, R. A., Hinde, J. and Demétrio, C. G. B. (2017) Half-normal plots and overdispersed models in R: the hnp package. Journal of Statistical Software 81(10):1-23.
Atkinson, A. C. (1985) Plots, transformations and regression, Clarendon Press, Oxford.
Demétrio, C. G. B. and Hinde, J. (1997) Half-normal plots and overdispersion. GLIM Newsletter 27:19-26.
Hinde, J. and Demétrio, C. G. B. (1998) Overdispersion: models and estimation. Computational Statistics and Data Analysis 27:151-170.
Demétrio, C. G. B., Hinde, J. and Moral, R. A. (2014) Models for overdispersed data in entomology. In Godoy, W. A. C. and Ferreira, C. P. (Eds.) Ecological modelling applied to entomology. Springer.
See Also
plot.hnp
, cbb
, chryso
, corn
, fungi
, oil
, progeny
, wolbachia
Examples
## Simple Poisson regression
set.seed(100)
counts <- c(rpois(5, 2), rpois(5, 4), rpois(5, 6), rpois(5, 8))
treatment <- gl(4, 5)
fit <- glm(counts ~ treatment, family=poisson)
anova(fit, test="Chisq")
## half-normal plot
hnp(fit)
## or save it in an object and then use the plot method
my.hnp <- hnp(fit, print.on=TRUE, plot=FALSE)
plot(my.hnp)
## changing graphical parameters
plot(my.hnp, lty=2, pch=4, cex=1.2)
plot(my.hnp, lty=c(2,3,2), pch=4, cex=1.2, col=c(2,2,2,1))
plot(my.hnp, main="Half-normal plot", xlab="Half-normal scores",
ylab="Deviance residuals", legpos="bottomright")
## Using a numeric vector
my.vec <- rnorm(20, 4, 4)
hnp(my.vec) # using N(0,1)
hnp(my.vec, scale=TRUE) # using N(mu, sigma^2)
## Implementing new classes
## Users provide three functions - diagfun, simfun and fitfun,
## in the following way:
##
## diagfun <- function(obj) {
## userfunction(obj, other_argumens)
## # e.g., resid(obj, type="pearson")
## }
##
## simfun <- function(n, obj) {
## userfunction(n, other_arguments) # e.g., rpois(n, fitted(obj))
## }
##
## fitfun <- function(y.) {
## userfunction(y. ~ linear_predictor, other_arguments, data=data)
## # e.g., glm(y. ~ block + factor1 * factor2, family=poisson,
## # data=mydata)
## }
##
## when response is binary:
## fitfun <- function(y.) {
## userfunction(cbind(y., m-y.) ~ linear_predictor,
## other_arguments, data=data)
## #e.g., glm(cbind(y., m-y.) ~ treatment - 1,
## # family=binomial, data=data)
## }
## Not run:
## Example no. 1: Using Cook's distance as a diagnostic measure
y <- rpois(30, lambda=rep(c(.5, 1.5, 5), each=10))
tr <- gl(3, 10)
fit1 <- glm(y ~ tr, family=poisson)
# diagfun
d.fun <- function(obj) cooks.distance(obj)
# simfun
s.fun <- function(n, obj) {
lam <- fitted(obj)
rpois(n, lambda=lam)
}
# fitfun
my.data <- data.frame(y, tr)
f.fun <- function(y.) glm(y. ~ tr, family=poisson, data=my.data)
# hnp call
hnp(fit1, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun)
## Example no. 2: Implementing gamma model using package gamlss
# load package
require(gamlss)
# model fitting
y <- rGA(30, mu=rep(c(.5, 1.5, 5), each=10), sigma=.5)
tr <- gl(3, 10)
fit2 <- gamlss(y ~ tr, family=GA)
# diagfun
d.fun <- function(obj) resid(obj) # this is the default if no
# diagfun is provided
# simfun
s.fun <- function(n, obj) {
mu <- obj$mu.fv
sig <- obj$sigma.fv
rGA(n, mu=mu, sigma=sig)
}
# fitfun
my.data <- data.frame(y, tr)
f.fun <- function(y.) gamlss(y. ~ tr, family=GA, data=my.data)
# hnp call
hnp(fit2, newclass=TRUE, diagfun=d.fun, simfun=s.fun,
fitfun=f.fun, data=data.frame(y, tr))
## Example no. 3: Implementing binomial model in gamlss
# model fitting
y <- rBI(30, bd=50, mu=rep(c(.2, .5, .9), each=10))
m <- 50
tr <- gl(3, 10)
fit3 <- gamlss(cbind(y, m-y) ~ tr, family=BI)
# diagfun
d.fun <- function(obj) resid(obj)
# simfun
s.fun <- function(n, obj) {
mu <- obj$mu.fv
bd <- obj$bd
rBI(n, bd=bd, mu=mu)
}
# fitfun
my.data <- data.frame(y, tr, m)
f.fun <- function(y.) gamlss(cbind(y., m-y.) ~ tr,
family=BI, data=my.data)
# hnp call
hnp(fit3, newclass=TRUE, diagfun=d.fun, simfun=s.fun, fitfun=f.fun)
## End(Not run)