cdfCompare {EnvStats}  R Documentation 
For one sample, plots the empirical cumulative distribution function (ecdf) along with a theoretical cumulative distribution function (cdf). For two samples, plots the two ecdf's. These plots are used to graphically assess goodness of fit.
cdfCompare(x, y = NULL, discrete = FALSE,
prob.method = ifelse(discrete, "emp.probs", "plot.pos"), plot.pos.con = NULL,
distribution = "norm", param.list = NULL,
estimate.params = is.null(param.list), est.arg.list = NULL,
x.col = "blue", y.or.fitted.col = "black",
x.lwd = 3 * par("cex"), y.or.fitted.lwd = 3 * par("cex"),
x.lty = 1, y.or.fitted.lty = 2, digits = .Options$digits, ...,
type = ifelse(discrete, "s", "l"), main = NULL, xlab = NULL, ylab = NULL,
xlim = NULL, ylim = NULL)
x 
numeric vector of observations. Missing ( 
y 
a numeric vector (not necessarily of the same length as 
discrete 
logical scalar indicating whether the assumed parent distribution of 
prob.method 
character string indicating what method to use to compute the plotting positions
(empirical probabilities). Possible values are

plot.pos.con 
numeric scalar between 0 and 1 containing the value of the plotting position constant.
When 
distribution 
when 
param.list 
when 
estimate.params 
when 
est.arg.list 
when 
x.col 
a numeric scalar or character string determining the color of the empirical cdf
(based on 
y.or.fitted.col 
a numeric scalar or character string determining the color of the empirical cdf
(based on 
x.lwd 
a numeric scalar determining the width of the empirical cdf (based on 
y.or.fitted.lwd 
a numeric scalar determining the width of the empirical cdf (based on 
x.lty 
a numeric scalar determining the line type of the empirical cdf
(based on 
y.or.fitted.lty 
a numeric scalar determining the line type of the empirical cdf
(based on 
digits 
when 
type , main , xlab , ylab , xlim , ylim , ... 
additional graphical parameters (see 
When both x
and y
are supplied, the function cdfCompare
creates the empirical cdf plot of x
and y
on
the same plot by calling the function ecdfPlot
.
When y
is not supplied, the function cdfCompare
creates the
emprical cdf plot of x
(by calling ecdfPlot
) and the
theoretical cdf plot (by calling cdfPlot
and using the
argument distribution
) on the same plot.
When y
is supplied, cdfCompare
invisibly returns a list with
components:
x.ecdf.list 
a list with components 
y.ecdf.list 
a list with components 
When y
is not supplied, cdfCompare
invisibly returns a list with
components:
x.ecdf.list 
a list with components 
fitted.cdf.list 
a list with components 
An empirical cumulative distribution function (ecdf) plot is a graphical tool that can be used in conjunction with other graphical tools such as histograms, strip charts, and boxplots to assess the characteristics of a set of data. It is easy to determine quartiles and the minimum and maximum values from such a plot. Also, ecdf plots allow you to assess local density: a higher density of observations occurs where the slope is steep.
Chambers et al. (1983, pp.1116) plot the observed order statistics on the
y
axis vs. the ecdf on the x
axis and call this a quantile plot.
Empirical cumulative distribution function (ecdf) plots are often plotted with
theoretical cdf plots (see cdfPlot
and cdfCompare
) to
graphically assess whether a sample of observations comes from a particular
distribution. The KolmogorovSmirnov goodnessoffit test
(see gofTest
) is the statistical companion of this kind of
comparison; it is based on the maximum vertical distance between the empirical
cdf plot and the theoretical cdf plot. More often, however,
quantilequantile (QQ) plots are used instead of ecdf plots to graphically assess
departures from an assumed distribution (see qqPlot
).
Steven P. Millard (EnvStats@ProbStatInfo.com)
Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.1116.
Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.
D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodnessof Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.762.
# Generate 20 observations from a normal (Gaussian) distribution
# with mean=10 and sd=2 and compare the empirical cdf with a
# theoretical normal cdf that is based on estimating the parameters.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(250)
x < rnorm(20, mean = 10, sd = 2)
dev.new()
cdfCompare(x)
#
# Generate 30 observations from an exponential distribution with parameter
# rate=0.1 (see the R help file for Exponential) and compare the empirical
# cdf with the empirical cdf of the normal observations generated in the
# previous example:
set.seed(432)
y < rexp(30, rate = 0.1)
dev.new()
cdfCompare(x, y)
#==========
# Generate 20 observations from a Poisson distribution with parameter lambda=10
# (see the R help file for Poisson) and compare the empirical cdf with a
# theoretical Poisson cdf based on estimating the distribution parameters.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(250)
x < rpois(20, lambda = 10)
dev.new()
cdfCompare(x, dist = "pois")
#==========
# Clean up
#
rm(x, y)
graphics.off()