prop.cint {corpora} | R Documentation |

This function computes a confidence interval for a population proportion from the corresponding frequency count in a sample. It either uses the Clopper-Pearson method (inverted exact binomial test) or the Wilson score method (inversion of a z-score test, with or without continuity correction).

```
prop.cint(k, n, method = c("binomial", "z.score"), correct = TRUE,
conf.level = 0.95, alternative = c("two.sided", "less", "greater"))
```

`k` |
frequency of a type in the corpus (or an integer vector of frequencies) |

`n` |
number of tokens in the corpus, i.e. sample size (or an integer vector specifying the sizes of different samples) |

`method` |
a character string specifying whether to compute
a Clopper-Pearson confidence interval ( |

`correct` |
if |

`conf.level` |
the desired confidence level (defaults to 95%) |

`alternative` |
a character string specifying the alternative
hypothesis, yielding a two-sided ( |

The confidence intervals computed by this function correspond to those
returned by `binom.test`

and `prop.test`

,
respectively. However, `prop.cint`

accepts vector arguments,
allowing many confidence intervals to be computed with a single
function call. In addition, it uses a fast approximation of the
two-sided binomial test that can safely be applied to large samples.

The confidence interval for a z-score test is computed by solving the z-score equation

```
%
\frac{k - np}{\sqrt{n p (1-p)}} = \alpha
```

for `p`

, where `\alpha`

is the `z`

-value corresponding
to the chosen confidence level (e.g. `\pm 1.96`

for a
two-sided test with 95% confidence). This leads to the quadratic
equation

```
%
p^2 (n + \alpha^2) + p (-2k - \alpha^2) + \frac{k^2}{n} = 0
```

whose two solutions correspond to the lower and upper boundary of the confidence interval.

When Yates' continuity correction is applied, the value `k`

in the
numerator of the `z`

-score equation has to be replaced by
`k^*`

, with `k^* = k - 1/2`

for the
*lower* boundary of the confidence interval (where `k > np`

)
and `k^* = k + 1/2`

for the *upper* boundary of
the confidence interval (where `k < np`

). In each case, the
corresponding solution of the quadratic equation has to be chosen
(i.e., the solution with `k > np`

for the lower boundary and vice
versa).

A data frame with two columns, labelled `lower`

for the lower
boundary and `upper`

for the upper boundary of the confidence
interval. The number of rows is determined by the length of the
longest input vector (`k`

, `n`

and `conf.level`

).

Stephanie Evert (Rlhttps://purl.org/stephanie.evert)

https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval

`z.score.pval`

, `prop.test`

,
`binom.pval`

, `binom.test`

[Package *corpora* version 0.5-1 Index]