contingency2xt {kSamples} | R Documentation |
Kruskal-Wallis Test for the 2 x t Contingency Table
Description
This function uses the Kruskal-Wallis criterion to test
the hypothesis of no association between the counts
for two responses
"A" and "B" across t categories.
Usage
contingency2xt(Avec, Bvec,
method = c("asymptotic", "simulated", "exact"),
dist = FALSE, tab0 = TRUE, Nsim = 1e+06)
Arguments
Avec |
vector of length t giving the counts A1,…,At
for response "A" according to t categories.
m=A1+…+At .
|
Bvec |
vector of length t giving the counts B1,…,Bt
for response "B" according to t categories.
n=B1+…+Bt=N−m .
|
method |
= c("asymptotic","simulated","exact") , where
"asymptotic" uses only an asymptotic chi-square approximation
with t−1 degrees of freedom to approximate the P -value.
This calculation is always done.
"simulated" uses Nsim simulated counts for Avec and
Bvec with the observed marginal totals, m, n, d = Avec+Bvec ,
to estimate the P -value.
"exact" enumerates all counts for Avec and Bvec with
the observed marginal totals to get an exact P -value. It is used only
when Nsim is at least as large as the number choose(m+t-1,t-1)
of full enumerations.
Otherwise, method reverts to "simulated" using the given Nsim .
|
dist |
FALSE (default) or TRUE . If dist = TRUE , the distribution of the
simulated or fully enumerated Kruskal-Wallis test statistics is
returned as null.dist , if dist = FALSE the value
of null.dist is NULL .
The coice dist = TRUE also limits Nsim <- min(Nsim,1e8) .
|
tab0 |
TRUE (default) or FALSE . If tab0 = TRUE , the null distribution
is returned in 2 column matrix form when
method = "simulated" . When tab0 = FALSE the simulated null distribution
is returned as a vector of all simulated values of the test statistic.
|
Nsim |
=10000 (default), number of simulated Avec splits to use.
It is only used when method = "simulated" ,
or when method = "exact" reverts to method =
"simulated" , as previously explained.
|
Details
For this data scenario the Kruskal-Wallis criterion is
K.star=mnN(N−1)(∑diAi2−Nm2)
with di=Ai+Bi
, treating "A" responses
as 1 and "B" responses as 2, and using midranks as explained in Lehmann (2006), Chapter 5.3.
For small sample sizes exact null distribution
calculations are possible, based on Algorithm C (Chase's sequence) in Knuth (2011),
which allows the enumeration of all possible splits of m
into counts
A1,…,At
such that
m=A1+…+At
,
followed by the calculation of the statistic
K.star
for each such split.
Simulation of A1,…,At
uses the probability model (5.35) in Lehmann (2006)
to successively generate hypergeometric counts A1,…,At
.
Both these processes, enumeration and simulation, are done in C.
Value
A list of class kSamples
with components
test.name |
"2 x t Contingency Table"
|
t |
number of classification categories
|
KW.cont |
2 (3) vector giving the observed KW statistic, its asymptotic
P -value (and simulated or exact P -value)
|
null.dist |
simulated or enumerated null distribution
of the test statistic. It is given as an M by 2 matrix,
where the first column (named KW ) gives the M unique ordered
values of the Kruskal-Wallis
statistic and the second column (named prob ) gives the corresponding (simulated or exact)
probabilities.
This format of null.dist is returned when method = "exact"
and dist = TRUE or when method = "simulated"
and dist = TRUE and tab0 = TRUE are specified.
For method = "simulated" , dist = TRUE , and
tab0 = FALSE the null distribution null.dist is returned as the vector of
all simulated test statistic values. This is used in contingency2xt.comb
in the simulation mode.
null.dist = NULL is returned
when dist = FALSE or when method =
"asymptotic" .
|
method |
the method used.
|
Nsim |
the number of simulations.
|
warning
method = "exact"
should only be used with caution.
Computation time is proportional to the number of enumerations. In most cases
dist = TRUE
should not be used, i.e.,
when the returned distribution objects
become too large for R's work space.
References
Knuth, D.E. (2011), The Art of Computer Programming, Volume 4A
Combinatorial Algorithms Part 1, Addison-Wesley
Kruskal, W.H. (1952), A Nonparametric Test for the Several Sample Problem,
The Annals of Mathematical Statistics,
Vol 23, No. 4, 525-540
Kruskal, W.H. and Wallis, W.A. (1952), Use of Ranks in One-Criterion Variance Analysis,
Journal of the American Statistical Association,
Vol 47, No. 260, 583–621.
Lehmann, E.L. (2006), Nonparametrics, Statistical Methods Based on Ranks,
Revised First Edition,
Springer, New York.
Examples
contingency2xt(c(25,15,20),c(16,6,18),method="exact",dist=FALSE,
tab0=TRUE,Nsim=1e3)
[Package
kSamples version 1.2-10
Index]