R: P-values of Fisher's exact test for frequency comparisons...

fisher.pval {corpora}

R Documentation

P-values of Fisher's exact test for frequency comparisons (corpora)

Description

This function computes the p-value of Fisher's exact test (Fisher 1934) for the comparison of corpus frequency counts (under the null hypothesis of equal population proportions). In the two-sided case, a “central” p-value (Fay 2010) provides better numerical efficiency than the likelihood-based approach of fisher.test and is always consistent with confidence intervals.

Usage


fisher.pval(k1, n1, k2, n2, 
            alternative = c("two.sided", "less", "greater"),
            log.p = FALSE)

Arguments

`k1`	frequency of a type in the first corpus (or an integer vector of type frequencies)
`n1`	the sample size of the first corpus (or an integer vector specifying the sizes of different samples)
`k2`	frequency of the type in the second corpus (or an integer vector of type frequencies, in parallel to `k1`)
`n2`	the sample size of the second corpus (or an integer vector specifying the sizes of different samples, in parallel to `n1`)
`alternative`	a character string specifying the alternative hypothesis; must be one of `two.sided` (default), `less` or `greater`
`log.p`	if TRUE, the natural logarithm of the p-value is returned

Details

For alternative="two.sided" (the default), the p-value of the “central” Fisher's exact test (Fay 2010) is computed, which differs from the more common likelihood-based method implemented by fisher.test (and referred to as the “two-sided Fisher's exact test” by Fay). This approach has two advantages: (i) it is numerically robust and efficient, even for very large samples and frequency counts; (ii) it is consistent with Clopper-Pearson type confidence intervals (see examples below).

For one-sided tests, the p-values returned by this function are identical to those computed by fisher.test on two-by-two contingency tables.

Value

The p-value of Fisher's exact test applied to the given data (or a vector of p-values).

Author(s)

Stephanie Evert (https://purl.org/stephanie.evert)

References

Fay, Michael P. (2010). Confidence intervals that match Fisher's exact or Blaker's exact tests. Biostatistics, 11(2), 373-374.

Fisher, R. A. (1934). Statistical Methods for Research Workers. Oliver & Boyd, Edinburgh, 2nd edition (1st edition 1925, 14th edition 1970).

Examples

## Fisher's Tea Drinker (see ?fisher.test)
TeaTasting <-
matrix(c(3, 1, 1, 3),
       nrow = 2,
       dimnames = list(Guess = c("Milk", "Tea"),
                       Truth = c("Milk", "Tea")))
print(TeaTasting)
##  - the "corpora" consist of 4 cups of tea each (n1 = n2 = 4)
##     => columns of TeaTasting
##  - frequency counts are the number of cups selected by drinker (k1 = 3, k2 = 1)
##     => first row of TeaTasting
##  - null hypothesis of equal type probability = drinker makes random guesses
fisher.pval(3, 4, 1, 4, alternative="greater")
fisher.test(TeaTasting, alternative="greater")$p.value # should be the same

fisher.pval(3, 4, 1, 4)         # central Fisher's exact test is equal to
fisher.test(TeaTasting)$p.value # standard two-sided Fisher's test for symmetric distribution

# inconsistency btw likelihood-based two-sided Fisher's test and confidence interval
# for 4/15 vs. 50/619 successes
fisher.test(cbind(c(4, 11), c(50, 619)))

# central Fisher's exact test is always consistent
fisher.pval(4, 15, 50, 619)

[Package corpora version 0.6 Index]