ecdfPlotCensored {EnvStats}  R Documentation 
Produce an empirical cumulative distribution function plot for Type I leftcensored or rightcensored data.
ecdfPlotCensored(x, censored, censoring.side = "left", discrete = FALSE,
prob.method = "michaelschucany", plot.pos.con = 0.375, plot.it = TRUE,
add = FALSE, ecdf.col = 1, ecdf.lwd = 3 * par("cex"), ecdf.lty = 1,
include.cen = FALSE, cen.pch = ifelse(censoring.side == "left", 6, 2),
cen.cex = par("cex"), cen.col = 4, ...,
type = ifelse(discrete, "s", "l"), main = NULL, xlab = NULL, ylab = NULL,
xlim = NULL, ylim = NULL)
x 
numeric vector of observations. Missing ( 
censored 
numeric or logical vector indicating which values of 
censoring.side 
character string indicating on which side the censoring occurs. The possible values are

discrete 
logical scalar indicating whether the assumed parent distribution of 
prob.method 
character string indicating what method to use to compute the plotting positions (empirical probabilities).
Possible values are The 
plot.pos.con 
numeric scalar between 0 and 1 containing the value of the plotting position constant.
The default value is 
plot.it 
logical scalar indicating whether to produce a plot or add to the current plot (see 
add 
logical scalar indicating whether to add the empirical cdf to the current plot ( 
ecdf.col 
a numeric scalar or character string determining the color of the empirical cdf line or points.
The default value is 
ecdf.lwd 
a numeric scalar determining the width of the empirical cdf line. The default value is

ecdf.lty 
a numeric scalar determining the line type of the empirical cdf line. The default value is

include.cen 
logical scalar indicating whether to include censored values in the plot. The default value is

cen.pch 
numeric scalar or character string indicating the plotting character to use to plot censored values.
The default value is 
cen.cex 
numeric scalar that determines the size of the plotting character used to plot censored values.
The default value is the current value of the cex graphics parameter. See the entry for 
cen.col 
numeric scalar or character string that determines the color of the plotting character used to
plot censored values. The default value is 
type , main , xlab , ylab , xlim , ylim , ... 
additional graphical parameters (see 
The function ecdfPlotCensored
does exactly the same thing as
ecdfPlot
, except it calls the function ppointsCensored
to compute the plotting positions (estimated cumulative probabilities) for the
uncensored observations.
If plot.it=TRUE
, the estimated cumulative probabilities for the uncensored
observations are plotted against the uncensored observations. By default, the
function ecdfPlotCensored
plots a step function when discrete=TRUE
,
and plots a straight line between points when discrete=FALSE
. The user may
override these defaults by supplying the graphics parameter
type (type="s"
for a step function, type="l"
for linear interpolation,
type="p"
for points only, etc.).
If include.cen=TRUE
, censored observations are included on the plot as points. The arguments
cen.pch
, cen.cex
, and cen.col
control the appearance of these points.
In cases where x
is a random sample, the emprical cdf will change from sample to sample and
the variability in these estimates can be dramatic for small sample sizes. Caution must be used in
interpreting the empirical cdf when a large percentage of the observations are censored.
ecdfPlotCensored
returns a list with the following components:
Order.Statistics 
numeric vector of the “ordered” observations. 
Cumulative.Probabilities 
numeric vector of the associated plotting positions. 
Censored 
logical vector indicating which of the ordered observations are censored. 
Censoring.Side 
character string indicating whether the data are left or rightcensored.
This is same value as the argument 
Prob.Method 
character string indicating what method was used to compute the plotting positions.
This is the same value as the argument 
Optional Component (only present when prob.method="michaelschucany"
or
prob.method="hirschstedinger"
):
Plot.Pos.Con 
numeric scalar containing the value of the plotting position constant that was used.
This is the same as the argument 
An empirical cumulative distribution function (ecdf) plot is a graphical tool that can be used in conjunction with other graphical tools such as histograms, strip charts, and boxplots to assess the characteristics of a set of data.
Censored observations complicate the procedures used to graphically explore data. Techniques from
survival analysis and life testing have been developed to generalize the procedures for constructing
plotting positions, empirical cdf plots, and qq plots to data sets with censored observations
(see ppointsCensored
).
Empirical cumulative distribution function (ecdf) plots are often plotted with theoretical cdf plots
(see cdfPlot
and cdfCompareCensored
) to graphically assess whether a
sample of observations comes from a particular distribution. More often, however, quantilequantile
(QQ) plots are used instead (see qqPlot
and qqPlotCensored
).
Steven P. Millard (EnvStats@ProbStatInfo.com)
Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.1116.
Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.
D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodnessof Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.762.
Gillespie, B.W., Q. Chen, H. Reichert, A. Franzblau, E. Hedgeman, J. Lepkowski, P. Adriaens, A. Demond, W. Luksemburg, and D.H. Garabrant. (2010). Estimating Population Distributions When Some Data Are Below a Limit of Detection by Using a Reverse KaplanMeier Estimator. Epidemiology 21(4), S64–S70.
Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R, Second Edition. John Wiley & Sons, Hoboken, New Jersey.
Helsel, D.R., and T.A. Cohn. (1988). Estimation of Descriptive Statistics for Multiply Censored Water Quality Data. Water Resources Research 24(12), 19972004.
Hirsch, R.M., and J.R. Stedinger. (1987). Plotting Positions for Historical Floods and Their Precision. Water Resources Research 23(4), 715727.
Kaplan, E.L., and P. Meier. (1958). Nonparametric Estimation From Incomplete Observations. Journal of the American Statistical Association 53, 457481.
Lee, E.T., and J.W. Wang. (2003). Statistical Methods for Survival Data Analysis, Third Edition. John Wiley & Sons, Hoboken, New Jersey, 513pp.
Michael, J.R., and W.R. Schucany. (1986). Analysis of Data from Censored Samples. In D'Agostino, R.B., and M.A. Stephens, eds. Goodnessof Fit Techniques. Marcel Dekker, New York, 560pp, Chapter 11, 461496.
Nelson, W. (1972). Theory and Applications of Hazard Plotting for Censored Failure Data. Technometrics 14, 945966.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R09007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. Chapter 15.
USEPA. (2010). Errata Sheet  March 2009 Unified Guidance. EPA 530/R09007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.
ppoints
, ppointsCensored
, ecdfPlot
,
qqPlot
, qqPlotCensored
, cdfPlot
,
cdfCompareCensored
.
# Generate 20 observations from a normal distribution with mean=20 and sd=5,
# censor all observations less than 18, then generate an empirical cdf plot
# for the complete data set and the censored data set. Note that the empirical
# cdf plot for the censored data set starts at the first ordered uncensored
# observation, and that for values of x > 18 the two emprical cdf plots are
# exactly the same. This is because there is only one censoring level and
# no uncensored observations fall below the censored observations.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(333)
x < rnorm(20, mean=20, sd=5)
censored < x < 18
sum(censored)
#[1] 7
new.x < x
new.x[censored] < 18
dev.new()
ecdfPlot(x, xlim = range(pretty(x)),
main = "Empirical CDF Plot for\nComplete Data Set")
dev.new()
ecdfPlotCensored(new.x, censored, xlim = range(pretty(x)),
main="Empirical CDF Plot for\nCensored Data Set")
# Clean up
#
rm(x, censored, new.x)
#
# Example 151 of USEPA (2009, page 1510) gives an example of
# computing plotting positions based on censored manganese
# concentrations (ppb) in groundwater collected at 5 monitoring
# wells. The data for this example are stored in
# EPA.09.Ex.15.1.manganese.df. Here we will create an empirical
# CDF plot based on the KaplanMeier method.
EPA.09.Ex.15.1.manganese.df
# Sample Well Manganese.Orig.ppb Manganese.ppb Censored
#1 1 Well.1 <5 5.0 TRUE
#2 2 Well.1 12.1 12.1 FALSE
#3 3 Well.1 16.9 16.9 FALSE
#4 4 Well.1 21.6 21.6 FALSE
#5 5 Well.1 <2 2.0 TRUE
#...
#21 1 Well.5 17.9 17.9 FALSE
#22 2 Well.5 22.7 22.7 FALSE
#23 3 Well.5 3.3 3.3 FALSE
#24 4 Well.5 8.4 8.4 FALSE
#25 5 Well.5 <2 2.0 TRUE
dev.new()
with(EPA.09.Ex.15.1.manganese.df,
ecdfPlotCensored(Manganese.ppb, Censored,
prob.method = "kaplanmeier", ecdf.col = "blue",
main = "Empirical CDF of Manganese Data\nBased on KaplanMeier"))
#==========
# Clean up
#
graphics.off()