ebinom {EnvStats}  R Documentation 
Estimate p
(the probability of “success”) for a binomial distribution,
and optionally construct a confidence interval for p
.
ebinom(x, size = NULL, method = "mle/mme/mvue", ci = FALSE,
ci.type = "twosided", ci.method = "score", correct = TRUE,
var.denom = "n", conf.level = 0.95, warn = TRUE)
x 
numeric or logical vector of observations. When 
size 
positive integer indicating the of number of trials; 
method 
character string specifying the method of estimation. The only possible value is

ci 
logical scalar indicating whether to compute a confidence interval for the mean. The default value
is 
ci.type 
character string indicating what kind of confidence interval to compute. The possible values are

ci.method 
character string indicating which method to use to construct the confidence interval. Possible values
are 
correct 
logical scalar indicating whether to use the continuity correction when 
var.denom 
character string indicating what value to use in the denominator of the variance estimator when

conf.level 
a scalar between 0 and 1 indicating the confidence level of the confidence interval. The default
value is 
warn 
a logical scalar indicating whether to issue a waning in the case when 
If x
contains any missing (NA
), undefined (NaN
) or
infinite (Inf
, Inf
) values, they will be removed prior to performing the estimation.
If \underline{x}
is a vector of n
observations from a binomial distribution with
parameters size=
1
and prob=
p
, then the sum of all the values in
\underline{x}
is an observation from a binomial distribution with parameters
size=
n
and prob=
p
.
If x
is an observation from a binomial distribution with parameters size=
n
and prob=
p
, the maximum likelihood estimator (mle), method of moments estimator (mme),
and minimum variance unbiased estimator (mvue) of p
is simply x/n
.
Confidence Intervals.
ci.method="score"
The confidence interval for p
based on the
score method was developed by Wilson (1927) and is discussed by Newcombe (1998a),
Agresti and Coull (1998), and Agresti and Caffo (2000). When ci=TRUE
and
ci.method="score"
, the function ebinom
calls the R function
prop.test
to compute the confidence interval. This method
has been shown to provide the best performance (in terms of actual coverage matching assumed
coverage) of all the methods provided here, although unlike the exact method, the actual
coverage can fall below the assumed coverage.
ci.method="exact"
The confidence interval for p
based on the
exact (ClopperPearson) method is discussed by Newcombe (1998a), Agresti and Coull (1998),
and Zar (2010, pp.543547). This is the method used in the R function
binom.test
. This method ensures the actual coverage is greater than or
equal to the assumed coverage.
ci.method="Wald"
The confidence interval for p
based on the Wald method
(with or without a correction for continuity) is the usual “normal approximation”
method and is discussed by Newcombe (1998a), Agresti and Coull (1998), Agresti and Caffo (2000),
and Zar (2010, pp.543547). This method is never recommended but is included
for historical purposes.
ci.method="adjusted Wald"
The confidence interval for p
based on the
adjusted Wald method is discussed by Agresti and Coull (1998), Agresti and Caffo (2000), and
Zar (2010, pp.543547). This is a simple modification of the Wald method and
performs surpringly well.
a list of class "estimate"
containing the estimated parameters and other information.
See
estimate.object
for details.
The binomial distribution is used to model processes with binary (YesNo, SuccessFailure,
HeadsTails, etc.) outcomes. It is assumed that the outcome of any one trial is independent
of any other trial, and that the probability of “success”, p
, is the same on
each trial. A binomial discrete random variable X
is the number of “successes” in
n
independent trials. A special case of the binomial distribution occurs when n=1
,
in which case X
is also called a Bernoulli random variable.
In the context of environmental statistics, the binomial distribution is sometimes used to model
the proportion of times a chemical concentration exceeds a set standard in a given period of
time (e.g., Gilbert, 1987, p.143). The binomial distribution is also used to compute an upper
bound on the overall Type I error rate for deciding whether a facility or location is in
compliance with some set standard. Assume the null hypothesis is that the facility is in compliance.
If a test of hypothesis is conducted periodically over time to test compliance and/or several tests
are performed during each time period, and the facility or location is always in compliance, and
each single test has a Type I error rate of \alpha
, and the result of each test is
independent of the result of any other test (usually not a reasonable assumption), then the number
of times the facility is declared out of compliance when in fact it is in compliance is a
binomial random variable with probability of “success” p=\alpha
being the
probability of being declared out of compliance (see USEPA, 2009).
Steven P. Millard (EnvStats@ProbStatInfo.com)
Agresti, A., and B.A. Coull. (1998). Approximate is Better than "Exact" for Interval Estimation of Binomial Proportions. The American Statistician, 52(2), 119–126.
Agresti, A., and B. Caffo. (2000). Simple and Effective Confidence Intervals for Proportions and Differences of Proportions Result from Adding Two Successes and Two Failures. The American Statistician, 54(4), 280–288.
Berthouex, P.M., and L.C. Brown. (1994). Statistics for Environmental Engineers. Lewis Publishers, Boca Raton, FL, Chapters 2 and 15.
Cochran, W.G. (1977). Sampling Techniques. John Wiley and Sons, New York, Chapter 3.
Fisher, R.A., and F. Yates. (1963). Statistical Tables for Biological, Agricultural, and Medical Research. 6th edition. Hafner, New York, 146pp.
Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions. Second Edition. John Wiley and Sons, New York, Chapters 12.
Forbes, C., M. Evans, N. Hastings, and B. Peacock. (2011). Statistical Distributions. Fourth Edition. John Wiley and Sons, Hoboken, NJ.
Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York, NY, Chapter 11.
Johnson, N. L., S. Kotz, and A.W. Kemp. (1992). Univariate Discrete Distributions. Second Edition. John Wiley and Sons, New York, Chapter 3.
Millard, S.P., and Neerchal, N.K. (2001). Environmental Statistics with SPLUS. CRC Press, Boca Raton, Florida.
Newcombe, R.G. (1998a). TwoSided Confidence Intervals for the Single Proportion: Comparison of Seven Methods. Statistics in Medicine, 17, 857–872.
Ott, W.R. (1995). Environmental Statistics and Data Analysis. Lewis Publishers, Boca Raton, FL, Chapter 4.
USEPA. (1989b). Statistical Analysis of GroundWater Monitoring Data at RCRA Facilities, Interim Final Guidance. EPA/530SW89026. Office of Solid Waste, U.S. Environmental Protection Agency, Washington, D.C.
USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R09007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.638.
Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. PrenticeHall, Upper Saddle River, NJ, Chapter 24.
Binomial, prop.test
, binom.test
,
ciBinomHalfWidth
, ciBinomN
,
plotCiBinomDesign
.
# Generate 20 observations from a binomial distribution with
# parameters size=1 and prob=0.2, then estimate the 'prob' parameter.
# (Note: the call to set.seed simply allows you to reproduce this
# example. Also, the only parameter estimated is 'prob'; 'size' is
# specified in the call to ebinom. The parameter 'size' is printed
# inorder to show all of the parameters associated with the
# distribution.)
set.seed(251)
dat < rbinom(20, size = 1, prob = 0.2)
ebinom(dat)
#Results of Distribution Parameter Estimation
#
#
#Assumed Distribution: Binomial
#
#Estimated Parameter(s): size = 20.0
# prob = 0.1
#
#Estimation Method: mle/mme/mvue for 'prob'
#
#Data: dat
#
#Sample Size: 20
#
# Generate one observation from a binomial distribution with
# parameters size=20 and prob=0.2, then estimate the "prob"
# parameter and compute a confidence interval:
set.seed(763)
dat < rbinom(1, size=20, prob=0.2)
ebinom(dat, size = 20, ci = TRUE)
#Results of Distribution Parameter Estimation
#
#
#Assumed Distribution: Binomial
#
#Estimated Parameter(s): size = 20.00
# prob = 0.35
#
#Estimation Method: mle/mme/mvue for 'prob'
#
#Data: dat
#
#Sample Size: 20
#
#Confidence Interval for: prob
#
#Confidence Interval Method: Score normal approximation
# (With continuity correction)
#
#Confidence Interval Type: twosided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 0.1630867
# UCL = 0.5905104
#
# Using the data from the last example, compare confidence
# intervals based on the various methods
ebinom(dat, size = 20, ci = TRUE,
ci.method = "score", correct = TRUE)$interval$limits
# LCL UCL
#0.1630867 0.5905104
ebinom(dat, size = 20, ci = TRUE,
ci.method = "score", correct = FALSE)$interval$limits
# LCL UCL
#0.1811918 0.5671457
ebinom(dat, size = 20, ci = TRUE,
ci.method = "exact")$interval$limits
# LCL UCL
#0.1539092 0.5921885
ebinom(dat, size = 20, ci = TRUE,
ci.method = "adjusted Wald")$interval$limits
# LCL UCL
#0.1799264 0.5684112
ebinom(dat, size = 20, ci = TRUE,
ci.method = "Wald", correct = TRUE)$interval$limits
# LCL UCL
#0.1159627 0.5840373
ebinom(dat, size = 20, ci = TRUE,
ci.method = "Wald", correct = FALSE)$interval$limits
# LCL UCL
#0.1409627 0.5590373
#
# Use the cadmium data on page 86 of USEPA (1989b) to compute
# twosided 95% confidence intervals for the probability of
# detection at background and compliance wells. The data are
# stored in EPA.89b.cadmium.df.
EPA.89b.cadmium.df
# Cadmium.orig Cadmium Censored Well.type
#1 0.1 0.100 FALSE Background
#2 0.12 0.120 FALSE Background
#3 BDL 0.000 TRUE Background
#...
#86 BDL 0.000 TRUE Compliance
#87 BDL 0.000 TRUE Compliance
#88 BDL 0.000 TRUE Compliance
attach(EPA.89b.cadmium.df)
# Probability of detection at Background well:
#
ebinom(!Censored[Well.type=="Background"], ci=TRUE)
#Results of Distribution Parameter Estimation
#
#
#Assumed Distribution: Binomial
#
#Estimated Parameter(s): size = 24.0000000
# prob = 0.3333333
#
#Estimation Method: mle/mme/mvue for 'prob'
#
#Data: !Censored[Well.type == "Background"]
#
#Sample Size: 24
#
#Confidence Interval for: prob
#
#Confidence Interval Method: Score normal approximation
# (With continuity correction)
#
#Confidence Interval Type: twosided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 0.1642654
# UCL = 0.5530745
# Probability of detection at Compliance well:
#
ebinom(!Censored[Well.type=="Compliance"], ci=TRUE)
#Results of Distribution Parameter Estimation
#
#
#Assumed Distribution: Binomial
#
#Estimated Parameter(s): size = 64.000
# prob = 0.375
#
#Estimation Method: mle/mme/mvue for 'prob'
#
#Data: !Censored[Well.type == "Compliance"]
#
#Sample Size: 64
#
#Confidence Interval for: prob
#
#Confidence Interval Method: Score normal approximation
# (With continuity correction)
#
#Confidence Interval Type: twosided
#
#Confidence Level: 95%
#
#Confidence Interval: LCL = 0.2597567
# UCL = 0.5053034
#
# Clean up
rm(dat)
detach("EPA.89b.cadmium.df")