qqPlotCensored {EnvStats}  R Documentation 
Produces a quantilequantile (QQ) plot, also called a probability plot, for Type I censored data.
qqPlotCensored(x, censored, censoring.side = "left",
prob.method = "michaelschucany", plot.pos.con = NULL,
distribution = "norm", param.list = list(mean = 0, sd = 1),
estimate.params = plot.type == "Tukey MeanDifference QQ",
est.arg.list = NULL, plot.type = "QQ", plot.it = TRUE,
equal.axes = qq.line.type == "01"  estimate.params,
add.line = FALSE, qq.line.type = "least squares",
duplicate.points.method = "standard", points.col = 1, line.col = 1,
line.lwd = par("cex"), line.lty = 1, digits = .Options$digits,
include.cen = FALSE, cen.pch = ifelse(censoring.side == "left", 6, 2),
cen.cex = par("cex"), cen.col = 4, ..., main = NULL, xlab = NULL,
ylab = NULL, xlim = NULL, ylim = NULL)
x 
numeric vector of observations that is assumed to represent a sample from the hypothesized
distribution specifed by 
censored 
numeric or logical vector indicating which values of 
censoring.side 
character string indicating on which side the censoring occurs. The possible values are

prob.method 
character string indicating what method to use to compute the plotting positions
(empirical probabilities). Possible values are: The default value is The 
plot.pos.con 
numeric scalar between 0 and 1 containing the value of the plotting position constant.
The default value is 
distribution 
a character string denoting the distribution abbreviation. The default value is

param.list 
a list with values for the parameters of the distribution. The default value is

estimate.params 
a logical scalar indicating whether to compute quantiles based on estimating the distribution
parameters ( You can set 
est.arg.list 
a list whose components are optional arguments associated with the function used to estimate
the parameters of the assumed distribution (see the section Estimating Distribution Parameters
in the help file EnvStats Functions for Censored Data).
For example, the function 
plot.type 
a character string denoting the kind of plot. Possible values are 
plot.it 
a logical scalar indicating whether to create a plot on the current graphics device.
The default value is 
equal.axes 
a logical scalar indicating whether to use the same range on the 
add.line 
a logical scalar indicating whether to add a line to the plot. If 
qq.line.type 
character string determining what kind of line to add to the QQ plot. Possible values are

duplicate.points.method 
a character string denoting how to plot points with duplicate 
points.col 
a numeric scalar or character string determining the color of the points in the plot.
The default value is 
line.col 
a numeric scalar or character string determining the color of the line in the plot.
The default value is 
line.lwd 
a numeric scalar determining the width of the line in the plot. The default value is

line.lty 
a numeric scalar determining the line type of the line in the plot. The default value is

digits 
a scalar indicating how many significant digits to print for the distribution parameters.
The default value is 
include.cen 
logical scalar indicating whether to include censored values in the plot. The default value is

cen.pch 
numeric scalar or character string indicating the plotting character to use to plot censored values.
The default value is 
cen.cex 
numeric scalar that determines the size of the plotting character used to plot censored values.
The default value is the current value of the cex graphics parameter. See the entry for 
cen.col 
numeric scalar or character string that determines the color of the plotting character used to
plot censored values. The default value is 
main , xlab , ylab , xlim , ylim , ... 
additional graphical parameters (see 
The function qqPlotCensored
does exactly the same thing as qqPlot
(when the argument y
is not supplied to qqPlot
), except
qqPlotCensored
calls the function ppointsCensored
to compute the
plotting positions (estimated cumulative probabilities).
The vector x
is assumed to be a sample from the probability distribution specified
by the argument distribution
(and param.list
if estimate.params=FALSE
).
When plot.type="QQ"
, the quantiles of x
are plotted on the y
axis against
the quantiles of the assumed distribution on the x
axis.
When plot.type="Tukey MeanDifference QQ"
, the difference of the quantiles is plotted on
the y
axis against the mean of the quantiles on the x
axis.
When prob.method="kaplanmeier"
and censoring.side="left"
and the assumed
distribution has a maximum support of infinity (Inf
; e.g., the normal or lognormal
distribution), the point invovling the largest
value of x
is not plotted because it corresponds to an estimated cumulative probability
of 1 which corresponds to an infinite plotting position.
When prob.method="modified kaplanmeier"
and censoring.side="left"
, the
estimated cumulative probability associated with the maximum value is modified from 1
to be (N  .375)/(N + .25)
where N
denotes the sample size (i.e., the Blom
plotting position) so that the point associated with the maximum value can be displayed.
qqPlotCensored
returns a list with the following components:
x 
numeric vector of 
y 
numeric vector of 
Order.Statistics 
numeric vector of the “ordered” observations.
When 
Cumulative.Probabilities 
numeric vector of the plotting positions associated with the order statistics. 
Censored 
logical vector indicating which of the ordered observations are censored. 
Censoring.Side 
character string indicating whether the data are left or rightcensored.
This is same value as the argument 
Prob.Method 
character string indicating what method was used to compute the plotting positions.
This is the same value as the argument 
Optional Component (only present when prob.method="michaelschucany"
or
prob.method="hirschstedinger"
):
Plot.Pos.Con 
numeric scalar containing the value of the plotting position constant that was used.
This is the same as the argument 
A quantilequantile (QQ) plot, also called a probability plot, is a plot of the observed
order statistics from a random sample (the empirical quantiles) against their (estimated)
mean or median values based on an assumed distribution, or against the empirical quantiles
of another set of data (Wilk and Gnanadesikan, 1968). QQ plots are used to assess whether
data come from a particular distribution, or whether two datasets have the same parent
distribution. If the distributions have the same shape (but not necessarily the same
location or scale parameters), then the plot will fall roughly on a straight line. If the
distributions are exactly the same, then the plot will fall roughly on the straight line y=x
.
A Tukey meandifference QQ plot, also called an md plot, is a modification of a
QQ plot. Rather than plotting observed quantiles vs. theoretical quantiles or observed
y
quantiles vs. observed x
quantiles, a Tukey meandifference QQ plot plots
the difference between the quantiles on the y
axis vs. the average of the quantiles on
the x
axis (Cleveland, 1993, pp.2223). If the two sets of quantiles come from the same
parent distribution, then the points in this plot should fall roughly along the horizontal line
y=0
. If one set of quantiles come from the same distribution with a shift in median, then
the points in this plot should fall along a horizontal line above or below the line y=0
.
A Tukey meandifference QQ plot enhances our perception of how the points in the QQ plot deviate
from a straight line, because it is easier to judge deviations from a horizontal line than from a
line with a nonzero slope.
In a QQ plot, the extreme points have more variability than points toward the center. A Ushaped
QQ plot indicates that the underlying distribution for the observations on the y
axis is
skewed to the right relative to the underlying distribution for the observations on the x
axis.
An upsidedownUshaped QQ plot indicates the y
axis distribution is skewed left relative to
the x
axis distribution. An Sshaped QQ plot indicates the y
axis distribution has
shorter tails than the x
axis distribution. Conversely, a plot that is bent down on the
left and bent up on the right indicates that the y
axis distribution has longer tails than
the x
axis distribution.
Censored observations complicate the procedures used to graphically explore data. Techniques from
survival analysis and life testing have been developed to generalize the procedures for
constructing plotting positions, empirical cdf plots, and QQ plots to data sets with censored
observations (see ppointsCensored
).
Steven P. Millard (EnvStats@ProbStatInfo.com)
Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.1116.
Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.
D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodnessof Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.762.
Gillespie, B.W., Q. Chen, H. Reichert, A. Franzblau, E. Hedgeman, J. Lepkowski, P. Adriaens, A. Demond, W. Luksemburg, and D.H. Garabrant. (2010). Estimating Population Distributions When Some Data Are Below a Limit of Detection by Using a Reverse KaplanMeier Estimator. Epidemiology 21(4), S64–S70.
Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R, Second Edition. John Wiley & Sons, Hoboken, New Jersey.
Helsel, D.R., and T.A. Cohn. (1988). Estimation of Descriptive Statistics for Multiply Censored Water Quality Data. Water Resources Research 24(12), 19972004.
Hirsch, R.M., and J.R. Stedinger. (1987). Plotting Positions for Historical Floods and Their Precision. Water Resources Research 23(4), 715727.
Kaplan, E.L., and P. Meier. (1958). Nonparametric Estimation From Incomplete Observations. Journal of the American Statistical Association 53, 457481.
Lee, E.T., and J. Wang. (2003). Statistical Methods for Survival Data Analysis, Third Edition. John Wiley and Sons, New York.
Michael, J.R., and W.R. Schucany. (1986). Analysis of Data from Censored Samples. In D'Agostino, R.B., and M.A. Stephens, eds. Goodnessof Fit Techniques. Marcel Dekker, New York, 560pp, Chapter 11, 461496.
Nelson, W. (1972). Theory and Applications of Hazard Plotting for Censored Failure Data. Technometrics 14, 945966.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R09007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. Chapter 15.
USEPA. (2010). Errata Sheet  March 2009 Unified Guidance. EPA 530/R09007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.
ppointsCensored
, EnvStats Functions for Censored Data,
qqPlot
, ecdfPlotCensored
,
qqPlotGestalt
.
# Generate 20 observations from a normal distribution with mean=20 and sd=5,
# censor all observations less than 18, then generate a QQ plot assuming
# a normal distribution for the complete data set and the censored data set.
# Note that the QQ plot for the censored data set starts at the first ordered
# uncensored observation, and that for values of x > 18 the two QQ plots are
# exactly the same. This is because there is only one censoring level and
# no uncensored observations fall below the censored observations.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(333)
x < rnorm(20, mean=20, sd=5)
censored < x < 18
sum(censored)
#[1] 7
new.x < x
new.x[censored] < 18
dev.new()
qqPlot(x, ylim = range(pretty(x)),
main = "QQ Plot for\nComplete Data Set")
dev.new()
qqPlotCensored(new.x, censored, ylim = range(pretty(x)),
main="QQ Plot for\nCensored Data Set")
# Clean up
#
rm(x, censored, new.x)
#
# Example 151 of USEPA (2009, page 1510) gives an example of
# computing plotting positions based on censored manganese
# concentrations (ppb) in groundwater collected at 5 monitoring
# wells. The data for this example are stored in
# EPA.09.Ex.15.1.manganese.df. Here we will create a QQ
# plot based on the KaplanMeier method. First we'll assume
# a normal distribution, then a lognormal distribution, then a
# gamma distribution.
EPA.09.Ex.15.1.manganese.df
# Sample Well Manganese.Orig.ppb Manganese.ppb Censored
#1 1 Well.1 <5 5.0 TRUE
#2 2 Well.1 12.1 12.1 FALSE
#3 3 Well.1 16.9 16.9 FALSE
#4 4 Well.1 21.6 21.6 FALSE
#5 5 Well.1 <2 2.0 TRUE
#...
#21 1 Well.5 17.9 17.9 FALSE
#22 2 Well.5 22.7 22.7 FALSE
#23 3 Well.5 3.3 3.3 FALSE
#24 4 Well.5 8.4 8.4 FALSE
#25 5 Well.5 <2 2.0 TRUE
# Assume normal distribution
#
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
qqPlotCensored(Manganese.ppb, Censored,
prob.method = "kaplanmeier", points.col = "blue", add.line = TRUE,
main = paste("Normal QQ Plot of Manganese Data",
"Based on KaplanMeier Plotting Positions", sep = "\n")))
# Include max value in the plot
#
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
qqPlotCensored(Manganese.ppb, Censored,
prob.method = "modified kaplanmeier", points.col = "blue",
add.line = TRUE,
main = paste("Normal QQ Plot of Manganese Data",
"Based on KaplanMeier Plotting Positions",
"(Max Included)", sep = "\n")))
# Assume lognormal distribution
#
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
qqPlotCensored(Manganese.ppb, Censored, dist = "lnorm",
prob.method = "kaplanmeier", points.col = "blue", add.line = TRUE,
main = paste("Lognormal QQ Plot of Manganese Data",
"Based on KaplanMeier Plotting Positions", sep = "\n")))
# Include max value in the plot
#
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
qqPlotCensored(Manganese.ppb, Censored, dist = "lnorm",
prob.method = "modified kaplanmeier", points.col = "blue",
add.line = TRUE,
main = paste("Lognormal QQ Plot of Manganese Data",
"Based on KaplanMeier Plotting Positions",
"(Max Included)", sep = "\n")))
# The lognormal distribution appears to be a better fit.
# Now create a QQ plot assuming a gamma distribution. Here we'll
# need to set estimate.params=TRUE.
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
qqPlotCensored(Manganese.ppb, Censored, dist = "gamma",
estimate.params = TRUE, prob.method = "kaplanmeier",
points.col = "blue", add.line = TRUE,
main = paste("Gamma QQ Plot of Manganese Data",
"Based on KaplanMeier Plotting Positions", sep = "\n")))
# Include max value in the plot
#
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
qqPlotCensored(Manganese.ppb, Censored, dist = "gamma",
estimate.params = TRUE, prob.method = "modified kaplanmeier",
points.col = "blue", add.line = TRUE,
main = paste("Gamma QQ Plot of Manganese Data",
"Based on KaplanMeier Plotting Positions",
"(Max Included)", sep = "\n")))
#==========
# Clean up
#
graphics.off()