R: Kruskal-Wallis Test for the 2 x t Contingency Table

contingency2xt {kSamples}

R Documentation

Kruskal-Wallis Test for the 2 x t Contingency Table

Description

This function uses the Kruskal-Wallis criterion to test the hypothesis of no association between the counts for two responses "A" and "B" across t categories.

Usage

contingency2xt(Avec, Bvec, 
	method = c("asymptotic", "simulated", "exact"), 
	dist = FALSE, tab0 = TRUE, Nsim = 1e+06)

Arguments

`Avec`	vector of length `t` giving the counts `A_1,\ldots, A_t` for response "A" according to `t` categories. `m = A_1 + \ldots + A_t`.
`Bvec`	vector of length `t` giving the counts `B_1,\ldots, B_t` for response "B" according to `t` categories. `n = B_1 + \ldots + B_t = N-m`.
`method`	= `c("asymptotic","simulated","exact")`, where `"asymptotic"` uses only an asymptotic chi-square approximation with `t-1` degrees of freedom to approximate the `P`-value. This calculation is always done. `"simulated"` uses `Nsim` simulated counts for `Avec` and `Bvec` with the observed marginal totals, `m, n, d = Avec+Bvec`, to estimate the `P`-value. `"exact"` enumerates all counts for `Avec` and `Bvec` with the observed marginal totals to get an exact `P`-value. It is used only when `Nsim` is at least as large as the number `choose(m+t-1,t-1)` of full enumerations. Otherwise, `method` reverts to `"simulated"` using the given `Nsim`.
`dist`	`FALSE` (default) or `TRUE`. If `dist = TRUE`, the distribution of the simulated or fully enumerated Kruskal-Wallis test statistics is returned as `null.dist`, if `dist = FALSE` the value of `null.dist` is `NULL`. The coice `dist = TRUE` also limits `Nsim <- min(Nsim,1e8)`.
`tab0`	`TRUE` (default) or `FALSE`. If `tab0 = TRUE`, the null distribution is returned in 2 column matrix form when `method = "simulated"`. When `tab0 = FALSE` the simulated null distribution is returned as a vector of all simulated values of the test statistic.
`Nsim`	`=10000` (default), number of simulated `Avec` splits to use. It is only used when `method = "simulated"`, or when `method = "exact"` reverts to `method =` `"simulated"`, as previously explained.

Details

For this data scenario the Kruskal-Wallis criterion is

K.star = \frac{N(N-1)}{mn}(\sum\frac{A_i^2}{d_i}-\frac{m^2}{N})

with d_i=A_i+B_i, treating "A" responses as 1 and "B" responses as 2, and using midranks as explained in Lehmann (2006), Chapter 5.3.

For small sample sizes exact null distribution calculations are possible, based on Algorithm C (Chase's sequence) in Knuth (2011), which allows the enumeration of all possible splits of m into counts A_1,\ldots, A_t such that m = A_1 + \ldots + A_t, followed by the calculation of the statistic K.star for each such split. Simulation of A_1,\ldots, A_t uses the probability model (5.35) in Lehmann (2006) to successively generate hypergeometric counts A_1,\ldots, A_t. Both these processes, enumeration and simulation, are done in C.

Value

A list of class kSamples with components

`test.name`	`"2 x t Contingency Table"`
`t`	number of classification categories
`KW.cont`	2 (3) vector giving the observed KW statistic, its asymptotic `P`-value (and simulated or exact `P`-value)
`null.dist`	simulated or enumerated null distribution of the test statistic. It is given as an `M` by 2 matrix, where the first column (named `KW`) gives the `M` unique ordered values of the Kruskal-Wallis statistic and the second column (named `prob`) gives the corresponding (simulated or exact) probabilities. This format of `null.dist` is returned when `method = "exact"` and `dist` `= TRUE` or when `method =` `"simulated"` and `dist = TRUE` and `tab0` `= TRUE` are specified. For `method =` `"simulated"`, `dist = TRUE`, and `tab0 = FALSE` the null distribution `null.dist` is returned as the vector of all simulated test statistic values. This is used in `contingency2xt.comb` in the simulation mode. `null.dist = NULL` is returned when `dist = FALSE` or when `method =` `"asymptotic"`.
`method`	the `method` used.
`Nsim`	the number of simulations.

warning

method = "exact" should only be used with caution. Computation time is proportional to the number of enumerations. In most cases dist = TRUE should not be used, i.e., when the returned distribution objects become too large for R's work space.

References

Knuth, D.E. (2011), The Art of Computer Programming, Volume 4A Combinatorial Algorithms Part 1, Addison-Wesley

Kruskal, W.H. (1952), A Nonparametric Test for the Several Sample Problem, The Annals of Mathematical Statistics, Vol 23, No. 4, 525-540

Kruskal, W.H. and Wallis, W.A. (1952), Use of Ranks in One-Criterion Variance Analysis, Journal of the American Statistical Association, Vol 47, No. 260, 583–621.

Lehmann, E.L. (2006), Nonparametrics, Statistical Methods Based on Ranks, Revised First Edition, Springer, New York.

Examples

contingency2xt(c(25,15,20),c(16,6,18),method="exact",dist=FALSE,
	tab0=TRUE,Nsim=1e3)

[Package kSamples version 1.2-10 Index]