contingency2xt {kSamples}R Documentation

Kruskal-Wallis Test for the 2 x t Contingency Table


This function uses the Kruskal-Wallis criterion to test the hypothesis of no association between the counts for two responses "A" and "B" across t categories.


contingency2xt(Avec, Bvec, 
	method = c("asymptotic", "simulated", "exact"), 
	dist = FALSE, tab0 = TRUE, Nsim = 1e+06)



vector of length tt giving the counts A1,,AtA_1,\ldots, A_t for response "A" according to tt categories. m=A1++Atm = A_1 + \ldots + A_t.


vector of length tt giving the counts B1,,BtB_1,\ldots, B_t for response "B" according to tt categories. n=B1++Bt=Nmn = B_1 + \ldots + B_t = N-m.


= c("asymptotic","simulated","exact"), where

"asymptotic" uses only an asymptotic chi-square approximation with t1t-1 degrees of freedom to approximate the PP-value. This calculation is always done.

"simulated" uses Nsim simulated counts for Avec and Bvec with the observed marginal totals, m, n, d = Avec+Bvec, to estimate the PP-value.

"exact" enumerates all counts for Avec and Bvec with the observed marginal totals to get an exact PP-value. It is used only when Nsim is at least as large as the number choose(m+t-1,t-1) of full enumerations. Otherwise, method reverts to "simulated" using the given Nsim.


FALSE (default) or TRUE. If dist = TRUE, the distribution of the simulated or fully enumerated Kruskal-Wallis test statistics is returned as null.dist, if dist = FALSE the value of null.dist is NULL. The coice dist = TRUE also limits Nsim <- min(Nsim,1e8).


TRUE (default) or FALSE. If tab0 = TRUE, the null distribution is returned in 2 column matrix form when method = "simulated". When tab0 = FALSE the simulated null distribution is returned as a vector of all simulated values of the test statistic.


=10000 (default), number of simulated Avec splits to use. It is only used when method = "simulated", or when method = "exact" reverts to method = "simulated", as previously explained.


For this data scenario the Kruskal-Wallis criterion is = \frac{N(N-1)}{mn}(\sum\frac{A_i^2}{d_i}-\frac{m^2}{N})

with di=Ai+Bid_i=A_i+B_i, treating "A" responses as 1 and "B" responses as 2, and using midranks as explained in Lehmann (2006), Chapter 5.3.

For small sample sizes exact null distribution calculations are possible, based on Algorithm C (Chase's sequence) in Knuth (2011), which allows the enumeration of all possible splits of mm into counts A1,,AtA_1,\ldots, A_t such that m=A1++Atm = A_1 + \ldots + A_t, followed by the calculation of the statistic for each such split. Simulation of A1,,AtA_1,\ldots, A_t uses the probability model (5.35) in Lehmann (2006) to successively generate hypergeometric counts A1,,AtA_1,\ldots, A_t. Both these processes, enumeration and simulation, are done in C.


A list of class kSamples with components

"2 x t Contingency Table"


number of classification categories


2 (3) vector giving the observed KW statistic, its asymptotic PP-value (and simulated or exact PP-value)


simulated or enumerated null distribution of the test statistic. It is given as an M by 2 matrix, where the first column (named KW) gives the M unique ordered values of the Kruskal-Wallis statistic and the second column (named prob) gives the corresponding (simulated or exact) probabilities.

This format of null.dist is returned when method = "exact" and dist = TRUE or when method = "simulated" and dist = TRUE and tab0 = TRUE are specified.

For method = "simulated", dist = TRUE, and tab0 = FALSE the null distribution null.dist is returned as the vector of all simulated test statistic values. This is used in contingency2xt.comb in the simulation mode.

null.dist = NULL is returned when dist = FALSE or when method = "asymptotic".


the method used.


the number of simulations.


method = "exact" should only be used with caution. Computation time is proportional to the number of enumerations. In most cases dist = TRUE should not be used, i.e., when the returned distribution objects become too large for R's work space.


Knuth, D.E. (2011), The Art of Computer Programming, Volume 4A Combinatorial Algorithms Part 1, Addison-Wesley

Kruskal, W.H. (1952), A Nonparametric Test for the Several Sample Problem, The Annals of Mathematical Statistics, Vol 23, No. 4, 525-540

Kruskal, W.H. and Wallis, W.A. (1952), Use of Ranks in One-Criterion Variance Analysis, Journal of the American Statistical Association, Vol 47, No. 260, 583–621.

Lehmann, E.L. (2006), Nonparametrics, Statistical Methods Based on Ranks, Revised First Edition, Springer, New York.



[Package kSamples version 1.2-10 Index]