R: Median interval, four confidence intervals, null expectation...

AlphInts {CooccurrenceAffinity}

R Documentation

Median interval, four confidence intervals, null expectation of cooccurrence count, and p-value

Description

This function calculates (i) MedianIntrvl, the interval of alpha values for which the co-occurrence count is a median, (ii) four Confidence Intervals, two using EHypQuInt(), one using EHypMidP(), and one using AcceptAffCI(), (iii) the Expected Co-occurrence count under the Null distribution, and (iv) the p-value for the observed co-occurrence count.

Usage

AlphInts(x, marg, scal = log(2 * marg[3]^2), lev = 0.95, pvalType = "Blaker")

Arguments

`x`	integer co-occurrence count that should properly fall within the closed interval [max(0,mA+mB-N), min(mA,mB)]
`marg`	a 3-entry integer vector (mA,mB,N) consisting of the first row and column totals and the table total for a 2x2 contingency table
`scal`	an integer parameter (default 2*N^2, capped at 10 within the function) that should be 2 or greater
`lev`	a confidence level, generally somewhere from 0.8 to 0.95 (default 0.95)
`pvalType`	a character string telling what kind of p-value to calculate. ‘Blaker’ or “midP’. If ‘pvalType=Blaker” (the default value), the p-value is calculated according to "Acceptability" function of Blaker (2000). If ‘pvalType=midP’, the p-value is calculated using the same idea as the midP confidence interval.

Details

This function calculates five intervals, three of them using EHypQuInt, one using EHypMidP, and one using AcceptAffCI. First ("MedianIntrvl") is the interval of alpha values compatible with x as median for the Extended Hypergeometric distribution (Harkness 1965) with fixed margins and alpha; second ("CI.CP") an "exact" conservative test-based 2-sided confidence interval (analogous to the Clopper-Pearson (1934) confidence interval for unknown binomial proportion) for alpha based on data (x,mA,mB,N); third the Acceptability Confidence Interval ("CI.Blaker") of Blaker (2000, Theorem 1) which is a better confidence interval than the CP-type interval "CI.CP" in the sense of being contained within "CI.CP" but still provably conservative, i.e., with coverage probability always at least as large as the nominal level. The fourth confidence interval ("CI.midQ") is the one given in formula (2) above of the Introduction to this documentation, with endpoints obtained as the midpoints of quantile intervals respectively to the (1+lev)/2 and (1-lev)/2 quantiles of the Extended Hypergeometric distribution; and the fifth ("CI.midP") which behaves very similarly to "CI.midQ" is defined by the midP approach analogous to the midP confidence interval for binomial proportions (Agresti 2013, p.605), and is calculated from EHypMidP.

The first of these intervals quantifies the underlying discreteness of the Extended Hypergeometric and its impact on the estimation of alpha. MedianIntrvl is an interval that will contain the MLE alpha-hat, and the mid-point of that interval is another reasonable estimator of alpha from the data. The recommended (slightly conservative) confidence interval is CI.Blaker, while the very similar intervals CI.midQ and CI.midP have coverage generally closer than CI.CP or CI.Blaker to the nominal level of coverage, at the cost of occasionally under-covering by as much as 0.04 or 0.05 for confidence levels 0.90 or 0.95. The comparison among intervals, and different possible goals that CIs of conservative or close-to-nominal coverage can serve, are similar to those compared by Brown et al. (2001) for interval estimation of an unknown binomial proportion.

Two other output list components are computed. First is Null.Exp, the expected co-occurrence count under the null (hypergeometric, corresponding to alpha=0) distribution, and second is the two-sided p-value for the equal-tailed test of the null hypothesis alpha=0. This p-value is calculated when pval="Blaker" according to Blaker's (2000) "Acceptability" function; if the input parameter pval is anything else, the p-value is calculated using the same idea as the midP confidence interval.

Value

A list of seven components: the median interval MedianIntrvl; the four two-sided Confidence Intervals described above, two (CI.CP and CI.Blaker) conservative and two (CI.midQ and CI.midP) with coverage probabilities generally closer to the nominal level; the null expectation Null.Exp of the co-occurrence count associated with alpha=0; and pval, the two-sided p-value for the hypothesis test of alpha=0, calculated by the method selectied, which is the Blaker acceptability-function method if pvalType="Blaker" and otherwise the "midP" p-value associated with the midP confidence-interval type.

Of the four Confidence intervals produced, CI.Blaker is the recommended conservative interval and CI.midP the interval to use if coverage close to the nominal is desired.

Author(s)

Eric Slud

References

Agresti, A. (2013) Categorical Data Analysis, 3rd edition, Wiley.

Blaker, H. (2000), “Confidence curves and improved exact confidence intervals for discrete distributions", Canadian Journal of Statistics 28, 783-798.

Brown, L., T. Cai, and A. DasGupta (2001), “Interval Estimation for a Binomial Proportion,” Statistical Science, 16, 101–117.

Clopper, C., and E. Pearson (1934), “The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial,” Biometrika, 26, 404–413.

Fog, A. (2015), BiasedUrn: Biased Urn Model Distributions. R package version 1.07.

Harkness, W. (1965), “Properties of the extended hypergeometric distribution“, Annals of Mathematical Statistics, 36, 938-945.

Examples

unlist(AlphInts(30,c(50,80,120), lev=0.9))

AlphInts(30,c(50,80,120), lev=0.9)$CI.CP
AlphInts(30,c(50,80,120), lev=0.9)$MedianIntrvl

EHypMidP(30,c(50,80,120), 0.9)
AlphInts(30,c(50,80,120), lev=0.9)$CI.midP
# NB the third argument of AlphInts is "scal" if not named,
# so must use "lev=0.9" to define the confidence level.

EHypQuInt(30,c(50,80,120), 0.5)
AlphInts(30,c(50,80,120), lev=0.9)$MedianIntrvl

# Alpha capped warning examples
AlphInts(60,c(80,80,100), lev=0.9)
ML.Alpha(60,c(80,80,100), lev=0.9)

AlphInts(80,c(80,80,100), lev=0.9)
ML.Alpha(80,c(80,80,100), lev=0.9)

# impossible x warning examples
AlphInts(81,c(80,80,100), lev=0.9)
ML.Alpha(81,c(80,80,100), lev=0.9)

# Degenerate distribution warning example
AlphInts(80,c(80,100,100), lev=0.9)
ML.Alpha(80,c(80,100,100), lev=0.9)

[Package CooccurrenceAffinity version 1.0 Index]