fisher.test {desctable} | R Documentation |
Fisher's Exact Test for Count Data
Description
Performs Fisher's exact test for testing the null of independence of rows and columns in a contingency table with fixed marginals, or with a formula expression.
Usage
fisher.test(
x,
y,
workspace,
hybrid,
control,
or,
alternative,
conf.int,
conf.level,
simulate.p.value,
B
)
## Default S3 method:
fisher.test(x, ...)
## S3 method for class 'formula'
fisher.test(
x,
y = NULL,
workspace = 200000,
hybrid = F,
control = list(),
or = 1,
alternative = "two.sided",
conf.int = T,
conf.level = 0.95,
simulate.p.value = F,
B = 2000
)
Arguments
x |
either a two-dimensional contingency table in matrix form, a factor object, or a formula of the form |
y |
a factor object; ignored if |
workspace |
an integer specifying the size of the workspace
used in the network algorithm. In units of 4 bytes. Only used for
non-simulated p-values larger than |
hybrid |
a logical. Only used for larger than |
control |
a list with named components for low level algorithm
control. At present the only one used is |
or |
the hypothesized odds ratio. Only used in the
|
alternative |
indicates the alternative hypothesis and must be
one of |
conf.int |
logical indicating if a confidence interval for the
odds ratio in a |
conf.level |
confidence level for the returned confidence
interval. Only used in the |
simulate.p.value |
a logical indicating whether to compute
p-values by Monte Carlo simulation, in larger than |
B |
an integer specifying the number of replicates used in the Monte Carlo test. |
... |
additional params to feed to original fisher.test |
Details
If x
is a matrix, it is taken as a two-dimensional contingency
table, and hence its entries should be nonnegative integers.
Otherwise, both x
and y
must be vectors of the same length.
Incomplete cases are removed, the vectors are coerced into factor
objects, and the contingency table is computed from these.
For 2 by 2 cases, p-values are obtained directly using the (central or non-central) hypergeometric distribution. Otherwise, computations are based on a C version of the FORTRAN subroutine FEXACT which implements the network developed by Mehta and Patel (1986) and improved by Clarkson, Fan and Joe (1993). The FORTRAN code can be obtained from http://www.netlib.org/toms/643. Note this fails (with an error message) when the entries of the table are too large. (It transposes the table if necessary so it has no more rows than columns. One constraint is that the product of the row marginals be less than 2^31 - 1.)
For 2 by 2 tables, the null of conditional independence is
equivalent to the hypothesis that the odds ratio equals one.
Exact
inference can be based on observing that in general, given
all marginal totals fixed, the first element of the contingency
table has a non-central hypergeometric distribution with
non-centrality parameter given by the odds ratio (Fisher, 1935).
The alternative for a one-sided test is based on the odds ratio,
so alternative = "greater"
is a test of the odds ratio being
bigger than or
.
Two-sided tests are based on the probabilities of the tables, and
take as more extreme
all tables with probabilities less than or
equal to that of the observed table, the p-value being the sum of
such probabilities.
For larger than 2 by 2 tables and hybrid = TRUE
, asymptotic
chi-squared probabilities are only used if the ‘Cochran
conditions’ are satisfied, that is if no cell has count zero, and
more than 80
exact calculation is used.
Simulation is done conditional on the row and column marginals, and works only if the marginals are strictly positive. (A C translation of the algorithm of Patefield (1981) is used.)
Value
A list with class "htest"
containing the following components:
p.value: the p-value of the test.
conf.int: a confidence interval for the odds ratio. Only present in
the 2 by 2 case and if argument conf.int = TRUE
.
estimate: an estimate of the odds ratio. Note that the _conditional_ Maximum Likelihood Estimate (MLE) rather than the unconditional MLE (the sample odds ratio) is used. Only present in the 2 by 2 case.
null.value: the odds ratio under the null, or
. Only present in the 2
by 2 case.
alternative: a character string describing the alternative hypothesis.
method: the character string "Fisher's Exact Test for Count Data"
.
data.name: a character string giving the names of the data.
References
Agresti, A. (1990) _Categorical data analysis_. New York: Wiley. Pages 59-66.
Agresti, A. (2002) _Categorical data analysis_. Second edition. New York: Wiley. Pages 91-101.
Fisher, R. A. (1935) The logic of inductive inference. _Journal of the Royal Statistical Society Series A_ *98*, 39-54.
Fisher, R. A. (1962) Confidence limits for a cross-product ratio. _Australian Journal of Statistics_ *4*, 41.
Fisher, R. A. (1970) _Statistical Methods for Research Workers._ Oliver & Boyd.
Mehta, C. R. and Patel, N. R. (1986) Algorithm 643. FEXACT: A Fortran subroutine for Fisher's exact test on unordered r*c contingency tables. _ACM Transactions on Mathematical Software_, *12*, 154-161.
Clarkson, D. B., Fan, Y. and Joe, H. (1993) A Remark on Algorithm 643: FEXACT: An Algorithm for Performing Fisher's Exact Test in r x c Contingency Tables. _ACM Transactions on Mathematical Software_, *19*, 484-488.
Patefield, W. M. (1981) Algorithm AS159. An efficient method of generating r x c tables with given row and column totals. _Applied Statistics_ *30*, 91-97.
See Also
fisher.exact
in package kexact2x2 for alternative
interpretations of two-sided tests and confidence intervals for 2
by 2 tables.
Examples
## Not run:
## Agresti (1990, p. 61f; 2002, p. 91) Fisher's Tea Drinker
## A British woman claimed to be able to distinguish whether milk or
## tea was added to the cup first. To test, she was given 8 cups of
## tea, in four of which milk was added first. The null hypothesis
## is that there is no association between the true order of pouring
## and the woman's guess, the alternative that there is a positive
## association (that the odds ratio is greater than 1).
TeaTasting <-
matrix(c(3, 1, 1, 3),
nrow = 2,
dimnames = list(Guess = c("Milk", "Tea"),
Truth = c("Milk", "Tea")))
fisher.test(TeaTasting, alternative = "greater")
## => p = 0.2429, association could not be established
## Fisher (1962, 1970), Criminal convictions of like-sex twins
Convictions <-
matrix(c(2, 10, 15, 3),
nrow = 2,
dimnames =
list(c("Dizygotic", "Monozygotic"),
c("Convicted", "Not convicted")))
Convictions
fisher.test(Convictions, alternative = "less")
fisher.test(Convictions, conf.int = FALSE)
fisher.test(Convictions, conf.level = 0.95)$conf.int
fisher.test(Convictions, conf.level = 0.99)$conf.int
## A r x c table Agresti (2002, p. 57) Job Satisfaction
Job <- matrix(c(1,2,1,0, 3,3,6,1, 10,10,14,9, 6,7,12,11), 4, 4,
dimnames = list(income = c("< 15k", "15-25k", "25-40k", "> 40k"),
satisfaction = c("VeryD", "LittleD", "ModerateS", "VeryS")))
fisher.test(Job)
fisher.test(Job, simulate.p.value = TRUE, B = 1e5)
###
## End(Not run)