qn.test {kSamples} | R Documentation |
Rank Score k-Sample Tests
Description
This function uses the QN
criterion (Kruskal-Wallis, van der Waerden scores, normal scores) to test
the hypothesis that k
independent samples arise
from a common unspecified distribution.
Usage
qn.test(..., data = NULL, test = c("KW", "vdW", "NS"),
method = c("asymptotic", "simulated", "exact"),
dist = FALSE, Nsim = 10000)
Arguments
... |
Either several sample vectors, say
or a list of such sample vectors, or a formula y ~ g, where y contains the pooled sample values and g (same length as y) is a factor with levels identifying the samples to which the elements of y belong. |
data |
= an optional data frame providing the variables in formula y ~ g. |
test |
=
|
method |
=
of
full enumerations. Otherwise, |
dist |
|
Nsim |
|
Details
The QN
criterion based on rank scores v_1,\ldots,v_N
is
QN=\frac{1}{s_v^2}\left(\sum_{i=1}^k \frac{(S_{iN}-n_i \bar{v}_{N})^2}{n_i}\right)
where S_{iN}
is the sum of rank scores for the i
-th sample and
\bar{v}_N
and
s_v^2
are sample mean and sample variance (denominator N-1
)
of all scores.
The statistic QN
is used to test the hypothesis that the samples all come
from the same but unspecified continuous distribution function F(x)
.
QN
is always adjusted for ties by averaging the scores of tied observations.
Conditions for the asymptotic approximation (chi-square with k-1
degrees of freedom)
can be found in Lehmann, E.L. (2006), Appendix Corollary 10, or in
Hajek, Sidak, and Sen (1999), Ch. 6, problems 13 and 14.
For small sample sizes exact null distribution
calculations are possible (with or without ties), based on a recursively extended
version of Algorithm C (Chase's sequence) in Knuth (2011), which allows the
enumeration of all possible splits of the pooled data into samples of
sizes of n_1, \ldots, n_k
, as appropriate under treatment randomization. This
is done in C, as is the simulation.
NA values are removed and the user is alerted with the total NA count. It is up to the user to judge whether the removal of NA's is appropriate.
The continuity assumption can be dispensed with, if we deal with
independent random samples from any common distribution,
or if randomization was used in allocating
subjects to samples or treatments, and if
the asymptotic, simulated or exact P
-values are viewed conditionally, given the tie pattern
in the pooled sample. Under such randomization any conclusions
are valid only with respect to the subjects that were randomly allocated
to their respective treatment samples.
Value
A list of class kSamples
with components
test.name |
|
k |
number of samples being compared |
ns |
vector |
N |
size of the pooled samples |
n.ties |
number of ties in the pooled sample |
qn |
2 (or 3) vector containing the observed |
warning |
logical indicator, |
null.dist |
simulated or enumerated null distribution
of the test statistic. It is |
method |
the |
Nsim |
the number of simulations used. |
warning
method = "exact"
should only be used with caution.
Computation time is proportional to the number of enumerations.
Experiment with system.time
and trial values for
Nsim
to get a sense of the required computing time.
In most cases
dist = TRUE
should not be used, i.e.,
when the returned distribution objects
become too large for R's work space.
References
Hajek, J., Sidak, Z., and Sen, P.K. (1999), Theory of Rank Tests (Second Edition), Academic Press.
Knuth, D.E. (2011), The Art of Computer Programming, Volume 4A Combinatorial Algorithms Part 1, Addison-Wesley
Kruskal, W.H. (1952), A Nonparametric Test for the Several Sample Problem, The Annals of Mathematical Statistics, Vol 23, No. 4, 525-540
Kruskal, W.H. and Wallis, W.A. (1952), Use of Ranks in One-Criterion Variance Analysis, Journal of the American Statistical Association, Vol 47, No. 260, 583–621.
Lehmann, E.L. (2006), Nonparametrics, Statistical Methods Based on Ranks, Revised First Edition, Springer Verlag.
See Also
Examples
u1 <- c(1.0066, -0.9587, 0.3462, -0.2653, -1.3872)
u2 <- c(0.1005, 0.2252, 0.4810, 0.6992, 1.9289)
u3 <- c(-0.7019, -0.4083, -0.9936, -0.5439, -0.3921)
yy <- c(u1, u2, u3)
gy <- as.factor(c(rep(1,5), rep(2,5), rep(3,5)))
set.seed(2627)
qn.test(u1, u2, u3, test="KW", method = "simulated",
dist = FALSE, Nsim = 1000)
# or with same seed
# qn.test(list(u1, u2, u3),test = "KW", method = "simulated",
# dist = FALSE, Nsim = 1000)
# or with same seed
# qn.test(yy ~ gy, test = "KW", method = "simulated",
# dist = FALSE, Nsim = 1000)