R: Fisher's Exact Test for Count Data

fisher_test_pv {DiscreteTests}

R Documentation

Fisher's Exact Test for Count Data

Description

fisher_test_pv() performs Fisher's exact test or a chi-square approximation to assess if rows and columns of a 2-by-2 contingency table with fixed marginals are independent. In contrast to stats::fisher.test(), it is vectorised, only calculates p-values and offers a normal approximation of their computation. Furthermore, it is capable of returning the discrete p-value supports, i.e. all observable p-values under a null hypothesis. Multiple tables can be analysed simultaneously. In two-sided tests, several procedures of obtaining the respective p-values are implemented.

Note: Please use fisher_test_pv()! The older fisher.test.pv() is deprecated in order to migrate to snake case. It will be removed in a future version.

Usage

fisher_test_pv(
  x,
  alternative = "two.sided",
  ts_method = "minlike",
  exact = TRUE,
  correct = TRUE,
  simple_output = FALSE
)

fisher.test.pv(
  x,
  alternative = "two.sided",
  ts.method = "minlike",
  exact = TRUE,
  correct = TRUE,
  simple.output = FALSE
)

Arguments

`x`	integer vector with four elements, a 2-by-2 matrix or an integer matrix (or data frame) with four columns, where each line represents a 2-by-2 table to be tested.
`alternative`	character vector that indicates the alternative hypotheses; each value must be one of `"two.sided"` (the default), `"less"` or `"greater"`.
`ts_method`, `ts.method`	single character string that indicates the two-sided p-value computation method (if any value in `alternative` equals `"two.sided"`) and must be one of `"minlike"` (the default), `"blaker"`, `"absdist"` or `"central"` (see details). Ignored, if `exact = FALSE`.
`exact`	logical value that indicates whether p-values are to be calculated by exact computation (`exact = TRUE`; the default) or by a continuous approximation.
`correct`	logical value that indicates if a continuity correction is to be applied (`correct = TRUE`; the default) or not. Ignored, if `exact = TRUE`.
`simple_output`, `simple.output`	logical value that indicates whether an R6 class object, including the tests' parameters and support sets, i.e. all observable p-values under each null hypothesis, is to be returned (see below).

Details

The parameters x and alternative are vectorised. They are replicated automatically, such that the number of x's rows is the same as the length of alternative. This allows multiple null hypotheses to be tested simultaneously. Since x is (if necessary) coerced to a matrix with four columns, it is replicated row-wise.

If exact = TRUE, Fisher's exact test is performed (the specific hypothesis depends on the value of alternative). Otherwise, if exact = FALSE, a chi-square approximation is used for two-sided hypotheses or a normal approximation for one-sided tests, based on the square root of the chi-squared statistic. This is possible because the degrees of freedom of chi-squared tests on 2-by-2 tables are limited to 1.

For exact computation, various procedures of determining two-sided p-values are implemented.

"minlike": The standard approach in stats::fisher.test() and stats::binom.test(). The probabilities of the likelihoods that are equal or less than the observed one are summed up. In Hirji (2006), it is referred to as the Probability-based approach.
"blaker": The minima of the observations' lower and upper tail probabilities are combined with the opposite tail not greater than these minima. More details can be found in Blaker (2000) or Hirji (2006), where it is referred to as the Combined Tails method.
"absdist": The probabilities of the absolute distances from the expected value that are greater than or equal to the observed one are summed up. In Hirji (2006), it is referred to as the Distance from Center approach.
"central": The smaller values of the observations' simply doubles the minimum of lower and upper tail probabilities. In Hirji (2006), it is referred to as the Twice the Smaller Tail method.

For non-exact (i.e. continuous approximation) approaches, ts_method is ignored, since all its methods would yield the same p-values. More specifically, they all converge to the doubling approach as in ts_mthod = "central".

Value

If simple.output = TRUE, a vector of computed p-values is returned. Otherwise, the output is a DiscreteTestResults R6 class object, which also includes the p-value supports and testing parameters. These have to be accessed by public methods, e.g. ⁠$get_pvalues()⁠.

References

Fisher, R. A. (1935). The logic of inductive inference. Journal of the Royal Statistical Society Series A, 98, pp. 39–54. doi:10.2307/2342435

Agresti, A. (2002). Categorical data analysis (2nd ed.). New York: John Wiley & Sons. pp. 91–97. doi:10.1002/0471249688

Blaker, H. (2000) Confidence curves and improved exact confidence intervals for discrete distributions. Canadian Journal of Statistics, 28(4), pp. 783-798. doi:10.2307/3315916

Hirji, K. F. (2006). Exact analysis of discrete data. New York: Chapman and Hall/CRC. pp. 55-83. doi:10.1201/9781420036190

Examples

# Constructing
S1 <- c(4, 2, 2, 14, 6, 9, 4, 0, 1)
S2 <- c(0, 0, 1, 3, 2, 1, 2, 2, 2)
N1 <- rep(148, 9)
N2 <- rep(132, 9)
F1 <- N1 - S1
F2 <- N2 - S2
df <- data.frame(S1, F1, S2, F2)

# Computation of Fisher's exact p-values (default: "minlike") and their supports
results_f   <- fisher_test_pv(df)
raw_pvalues <- results_f$get_pvalues()
pCDFlist    <- results_f$get_pvalue_supports()

# Computation of p-values of chi-square tests and their supports
results_c   <- fisher_test_pv(df, exact = FALSE)
raw_pvalues <- results_c$get_pvalues()
pCDFlist    <- results_c$get_pvalue_supports()

[Package DiscreteTests version 0.2.0 Index]