R: Confidence interval estimation for the binomial parameter pi...

ci.p {asbio}

R Documentation

Confidence interval estimation for the binomial parameter pi using five popular methods.

Description

Confidence interval formulae for \mu are not appropriate for variables describing binary outcomes. The function p.conf calculates confidence intervals for the binomial parameter \pi (probability of success) using raw or summarized data. By default Agresti-Coull point estimators are used to estimate \pi and \sigma_{\hat{\pi}}. If raw data are to be used (the default) then successes should be indicated as ones and failures as zeros in the data vector. Finite population corrections can also be specified.

Usage


ci.p(data, conf = 0.95, summarized = FALSE, phat = NULL, 
fpc = FALSE, n = NULL, N = NULL, method="agresti.coull", plot = TRUE)

Arguments

`data`	A vector of binary data. Required if `summarized = FALSE`.
`conf`	Level of confidence 1 - P(type I error).
`summarized`	Logical; indicate whether raw data or summary stats are to be used.
`phat`	Estimate of `\pi`. Required if `summarized = TRUE`.
`fpc`	Logical. Indicates whether finite population corrections should be used. If `fpc = TRUE` then `N` must be specified. Finite population corrections are not possible for `method = "exact"` or `method = "score"`.
`n`	Sample size. Required if `summarized = TRUE`.
`N`	Population size. Required if `fpc = TRUE`.
`method`	Type of method to be used in confidence interval calculations, `method ="agresti.coull"` is the default. Other procedures include `method="asymptotic"` which provides the conventional normal (Wald) approximation, `method = "score"`, `method = "LR"`, and `method="exact"` (see Details below). Partial names can be used. The `"exact"` method cannot be implemented if `summarized=TRUE`.
`plot`	Logical. Should likelihood ratio plot be created with estimate from `method = "LR"`.

Details

For the binomial distribution, the parameter of interest is the probability of success, \pi. ML estimators for the parameter, \pi, and its standard deviation, \sigma_\pi are:

\hat{\pi}=\frac{x}{n},

\hat{\sigma}_{\hat{\pi}}=\sqrt{\frac{\hat{\pi}(1-\hat{\pi})}{n}}

where x is the number of successes and n is the number of observations.

Because the sampling distribution of any ML estimator is asymptotically normal, an "asymptotic" 100(1 - \alpha)% confidence interval for \pi is found using:

\hat{\pi}\pm z_{1-(\alpha/2)}\hat{\sigma}_{\hat{\pi}}.

This method has also been called the Wald confidence interval.

These estimators can create extremely inaccurate confidence intervals, particularly for small sample sizes or when \pi is near 0 or 1 (Agresti 2012). A better method is to invert the Wald binomial test statistic and vary values for \pi_0 in the test statistic numerator and standard error. The interval consists of values of \pi_0 in which result in a failure to reject null at \alpha. Bounds can be obtained by finding the roots of a quadratic expansion of the binomial likelihood function (See Agresti 2012). This has been called a "score" confidence interval (Agresti 2012). An simple approximation to this method can be obtained by adding z_{1-(\alpha/2)} (\approx 2 for \alpha = 0.05) to the number of successes and failures (Agresti and Coull 1998). The resulting Agresti-Coull estimators for \pi and \sigma_{\hat{\pi}} are:

\hat{\pi}=\frac{x+z^2/2}{n+z^2},

\hat{\sigma}_{\hat{\pi}}=\sqrt{\frac{\hat{\pi}(1-\hat{\pi})}{n+z^2}}.

where z is the standard normal inverse cdf at probability 1 - \alpha/2.

As above, the 100(1 - \alpha)% confidence interval for the binomial parameter \pi is found using:

\hat{\pi}\pm z_{1-(\alpha/2)}\hat{\sigma}_{\hat{\pi}}.

The likelihood ratio method method = "LR" finds points in the binomial log-likelihood function where the difference between the maximum likelihood and likelihood function is closest to \chi_1^{2}(1 - \alpha)/2 for support given in \pi_0. As support the function uses seq(0.00001, 0.99999, by = 0.00001).

The "exact" method of Clopper and Pearson (1934) is bounded at the nominal limits, but actual coverage may be well below this level, particularly when n is small and \pi is near 0 or 1.

Agresti (2012) recommends the Agresti-Coull method over the normal approximation, the score method over the Agresti-Coull method, and the likelihood ratio method over all others. The Clopper Pearson has been repeatedly criticized as being too conservative (Agresti and Coull 2012).

Value

Returns a list of class = "ci".

`pi.hat`	Estimate for `\pi`.
`S.p.hat`	Estimate for `\sigma_{\hat{\pi}}`.
`margin`	Confidence margin.
`ci`	Confidence interval.

Note

This function contains only a few of the many methods that have been proposed for confidence interval estimation for \pi.

Author(s)

Ken Aho. thanks to Simon Thelwall for finding an error with summarized data under fpc.

References

Agresti, A. (2012) Categorical Data Analysis, 3rd edition. New York. Wiley.

Agresti, A., and Coull, B . A. (1998) Approximate is better than 'exact' for interval estimation of binomial proportions. The American Statistician. 52: 119-126.

Clopper, C. and Pearson, S. (1934) The use of confidence or fiducial limits illustrated in the case of the Binomial. Biometrika 26: 404-413.

Ott, R. L., and Longnecker, M. T. (2004) A First Course in Statistical Methods. Thompson.

Wilson, E. B.(1927) Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association 22: 209-212.

Examples

#In 2001, it was estimated that 56,200 Americans would be diagnosed with 
# non-Hodgkin's lymphoma and that 26,300 would die from it (Cernan et al. 2002).  
# Here we find the 95% confidence interval for the probability of diagnosis, pi. 

ci.p(c(rep(0, 56200-26300),rep(1,26300))) # Agresti-Coull
ci.p(c(rep(0, 56200-26300),rep(1,26300)), method = "LR") # Likelihood Ratio

# summarized = TRUE
n = 56200
x = 26300
phat = x/n

ci.p(summarized = TRUE, phat = phat, n = n) # Agresti-Coull

# Use 2001 US population size as N
N <- 285 * 10^6
ci.p(c(rep(0, 56200-26300),rep(1,26300)), fpc = TRUE, N = N) # Agresti-Coull
ci.p(summarized = TRUE, phat = phat, n = n, N = N, fpc = TRUE) # Agresti-Coull

[Package asbio version 1.9-7 Index]