cdfCompare {EnvStats} | R Documentation |
Plot Two Cumulative Distribution Functions
Description
For one sample, plots the empirical cumulative distribution function (ecdf) along with a theoretical cumulative distribution function (cdf). For two samples, plots the two ecdf's. These plots are used to graphically assess goodness of fit.
Usage
cdfCompare(x, y = NULL, discrete = FALSE,
prob.method = ifelse(discrete, "emp.probs", "plot.pos"), plot.pos.con = NULL,
distribution = "norm", param.list = NULL,
estimate.params = is.null(param.list), est.arg.list = NULL,
x.col = "blue", y.or.fitted.col = "black",
x.lwd = 3 * par("cex"), y.or.fitted.lwd = 3 * par("cex"),
x.lty = 1, y.or.fitted.lty = 2, digits = .Options$digits, ...,
type = ifelse(discrete, "s", "l"), main = NULL, xlab = NULL, ylab = NULL,
xlim = NULL, ylim = NULL)
Arguments
x |
numeric vector of observations. Missing ( |
y |
a numeric vector (not necessarily of the same length as |
discrete |
logical scalar indicating whether the assumed parent distribution of |
prob.method |
character string indicating what method to use to compute the plotting positions
(empirical probabilities). Possible values are
|
plot.pos.con |
numeric scalar between 0 and 1 containing the value of the plotting position constant.
When |
distribution |
when |
param.list |
when |
estimate.params |
when |
est.arg.list |
when |
x.col |
a numeric scalar or character string determining the color of the empirical cdf
(based on |
y.or.fitted.col |
a numeric scalar or character string determining the color of the empirical cdf
(based on |
x.lwd |
a numeric scalar determining the width of the empirical cdf (based on |
y.or.fitted.lwd |
a numeric scalar determining the width of the empirical cdf (based on |
x.lty |
a numeric scalar determining the line type of the empirical cdf
(based on |
y.or.fitted.lty |
a numeric scalar determining the line type of the empirical cdf
(based on |
digits |
when |
type , main , xlab , ylab , xlim , ylim , ... |
additional graphical parameters (see |
Details
When both x
and y
are supplied, the function cdfCompare
creates the empirical cdf plot of x
and y
on
the same plot by calling the function ecdfPlot
.
When y
is not supplied, the function cdfCompare
creates the
emprical cdf plot of x
(by calling ecdfPlot
) and the
theoretical cdf plot (by calling cdfPlot
and using the
argument distribution
) on the same plot.
Value
When y
is supplied, cdfCompare
invisibly returns a list with
components:
x.ecdf.list |
a list with components |
y.ecdf.list |
a list with components |
When y
is not supplied, cdfCompare
invisibly returns a list with
components:
x.ecdf.list |
a list with components |
fitted.cdf.list |
a list with components |
Note
An empirical cumulative distribution function (ecdf) plot is a graphical tool that can be used in conjunction with other graphical tools such as histograms, strip charts, and boxplots to assess the characteristics of a set of data. It is easy to determine quartiles and the minimum and maximum values from such a plot. Also, ecdf plots allow you to assess local density: a higher density of observations occurs where the slope is steep.
Chambers et al. (1983, pp.11-16) plot the observed order statistics on the
y
-axis vs. the ecdf on the x
-axis and call this a quantile plot.
Empirical cumulative distribution function (ecdf) plots are often plotted with
theoretical cdf plots (see cdfPlot
and cdfCompare
) to
graphically assess whether a sample of observations comes from a particular
distribution. The Kolmogorov-Smirnov goodness-of-fit test
(see gofTest
) is the statistical companion of this kind of
comparison; it is based on the maximum vertical distance between the empirical
cdf plot and the theoretical cdf plot. More often, however,
quantile-quantile (Q-Q) plots are used instead of ecdf plots to graphically assess
departures from an assumed distribution (see qqPlot
).
Author(s)
Steven P. Millard (EnvStats@ProbStatInfo.com)
References
Chambers, J.M., W.S. Cleveland, B. Kleiner, and P.A. Tukey. (1983). Graphical Methods for Data Analysis. Duxbury Press, Boston, MA, pp.11-16.
Cleveland, W.S. (1993). Visualizing Data. Hobart Press, Summit, New Jersey, 360pp.
D'Agostino, R.B. (1986a). Graphical Analysis. In: D'Agostino, R.B., and M.A. Stephens, eds. Goodness-of Fit Techniques. Marcel Dekker, New York, Chapter 2, pp.7-62.
See Also
Examples
# Generate 20 observations from a normal (Gaussian) distribution
# with mean=10 and sd=2 and compare the empirical cdf with a
# theoretical normal cdf that is based on estimating the parameters.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(250)
x <- rnorm(20, mean = 10, sd = 2)
dev.new()
cdfCompare(x)
#----------
# Generate 30 observations from an exponential distribution with parameter
# rate=0.1 (see the R help file for Exponential) and compare the empirical
# cdf with the empirical cdf of the normal observations generated in the
# previous example:
set.seed(432)
y <- rexp(30, rate = 0.1)
dev.new()
cdfCompare(x, y)
#==========
# Generate 20 observations from a Poisson distribution with parameter lambda=10
# (see the R help file for Poisson) and compare the empirical cdf with a
# theoretical Poisson cdf based on estimating the distribution parameters.
# (Note: the call to set.seed simply allows you to reproduce this example.)
set.seed(250)
x <- rpois(20, lambda = 10)
dev.new()
cdfCompare(x, dist = "pois")
#==========
# Clean up
#---------
rm(x, y)
graphics.off()