prop.cint {corpora} | R Documentation |
Confidence interval for proportion based on frequency counts (corpora)
Description
This function computes a confidence interval for a population proportion from the corresponding frequency count in a sample. It either uses the Clopper-Pearson method (inverted exact binomial test) or the Wilson score method (inversion of a z-score test, with or without continuity correction).
Usage
prop.cint(k, n, method = c("binomial", "z.score"), correct = TRUE, p.adjust=FALSE,
conf.level = 0.95, alternative = c("two.sided", "less", "greater"))
Arguments
k |
frequency of a type in the corpus (or an integer vector of frequencies) |
n |
number of tokens in the corpus, i.e. sample size (or an integer vector specifying the sizes of different samples) |
method |
a character string specifying whether to compute
a Clopper-Pearson confidence interval ( |
correct |
if |
p.adjust |
if |
conf.level |
the desired confidence level (defaults to 95%) |
alternative |
a character string specifying the alternative
hypothesis, yielding a two-sided ( |
Details
The confidence intervals computed by this function correspond to those
returned by binom.test
and prop.test
,
respectively. However, prop.cint
accepts vector arguments,
allowing many confidence intervals to be computed with a single
function call in a computationally efficient manner.
The Clopper-Pearson confidence interval (binomial
) is
obtained by inverting the exact binomial test at significance level
\alpha
= 1 - confidence.level
.
In the two-sided case, the p-value of the test is computed using the
“central” method Fay (2010: 53), i.e. as twice the tail probability
of the matching tail. This corresponds to the algorithm originally proposed
by Clopper & Pearson (1934).
The limits of the confidence interval are computed in an efficient and numerically robust manner via (the inverse of) the incomplete Beta function.
The Wilscon score confidence interval (z.score
) is computed
by solving the equation of the z-score test
%
\frac{k - np}{\sqrt{n p (1-p)}} = A
for p
, where A
is the z
-value corresponding
to the chosen confidence level (e.g. \pm 1.96
for a
two-sided test with 95% confidence). This leads to the quadratic
equation
%
p^2 (n + A^2) + p (-2k - A^2) + \frac{k^2}{n} = 0
whose two solutions correspond to the lower and upper boundary of the confidence interval.
When Yates' continuity correction is applied, the value k
in the
numerator of the z
-score equation has to be replaced by
k^*
, with k^* = k - 1/2
for the
lower boundary of the confidence interval (where k > np
)
and k^* = k + 1/2
for the upper boundary of
the confidence interval (where k < np
). In each case, the
corresponding solution of the quadratic equation has to be chosen
(i.e., the solution with k > np
for the lower boundary and vice
versa).
If a Bonferroni correction is applied, the significance level \alpha
of the underlying test is divided by the number m
of tests carried out
(specified explicitly by the user or given implicitly by length(k)
):
\alpha' = \alpha / m
.
Value
A data frame with two columns, labelled lower
for the lower
boundary and upper
for the upper boundary of the confidence
interval. The number of rows is determined by the length of the
longest input vector (k
, n
and conf.level
).
Author(s)
Stephanie Evert (https://purl.org/stephanie.evert)
References
Clopper, C. J. & Pearson, E. S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26(4), 404-413.
Fay, Michael P. (2010). Two-sided exact tests and matching confidence intervals for discrete data. The R Journal, 2(1), 53-58.
https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval
See Also
z.score.pval
, prop.test
,
binom.pval
, binom.test
Examples
# Clopper-Pearson confidence interval
binom.test(19, 100)
prop.cint(19, 100, method="binomial")
# Wilson score confidence interval
prop.test(19, 100)
prop.cint(19, 100, method="z.score")