Steel.test {kSamples} | R Documentation |
Steel's Multiple Comparison Wilcoxon Tests
Description
This function uses pairwise Wilcoxon tests, comparing a common control sample with each of several treatment samples, in a multiple comparison fashion. The experiment wise significance probabity is calculated, estimated, or approximated, when testing the hypothesis that all independent samples arise from a common unspecified distribution, or that treatments have no effect when assigned randomly to the given subjects.
Usage
Steel.test(..., data = NULL,
method = c("asymptotic", "simulated", "exact"),
alternative = c("greater","less","two-sided"),
dist = FALSE, Nsim = 10000)
Arguments
... |
Either several sample vectors, say
or a list of such sample vectors. or a formula y ~ g, where y contains the pooled sample values and g (same length as y) is a factor with levels identifying the samples to which the elements of y belong. The lowest factor level corresponds to the control sample, the other levels to treatment samples. |
data |
= an optional data frame providing the variables in formula y ~ g or y, g input |
method |
=
of full enumerations. Otherwise, |
alternative |
= |
dist |
|
Nsim |
|
Details
The Steel criterion uses the Wilcoxon test statistic in the pairwise comparisons of the
common control sample with each of the treatment samples. These statistics are used in
standardized form, using the means and standard deviations as they apply conditionally
given the tie pattern in the pooled data, see Scholz (2016). This conditional treatment allows for
correct usage in the presence of ties and is appropriate either when the samples are independent
and come from the same distribution (continuous or not) or when treatments are assigned
randomly among the total of N
subjects. However, in the case of ties the significance probability
has to be viewed conditionally given the tie pattern.
The Steel statistic is used to test the hypothesis that the samples all come
from the same but unspecified distribution function F(x)
, or, under random treatment
assigment, that the treatments have no effect. The significance probability is the probability
of obtaining test results as extreme or more extreme than the observed test statistic,
when testing for the possibility of a treatment effect under any of the treatments.
For small sample sizes exact (conditional) null distribution
calculations are possible (with or without ties), based on a recursively extended
version of Algorithm C (Chase's sequence) in Knuth (2011), which allows the
enumeration of all possible splits of the pooled data into samples of
sizes of n_1, \ldots, n_k
, as appropriate under treatment randomization. This
is done in C, as is the simulation of such splits.
NA values are removed and the user is alerted with the total NA count. It is up to the user to judge whether the removal of NA's is appropriate.
Value
A list of class kSamples
with components
test.name |
|
alternative |
"greater", "less", or "two-sided" |
k |
number of samples being compared, including the control sample as the first one |
ns |
vector |
N |
size of the pooled sample |
n.ties |
number of ties in the pooled sample |
st |
2 (or 3) vector containing the observed standardized Steel statistic,
its asymptotic |
warning |
logical indicator, |
null.dist |
simulated or enumerated null distribution vector
of the test statistic. It is |
method |
the |
Nsim |
the number of simulations used. |
W |
vector
|
mu |
mean vector |
tau |
vector of standard deviations of |
sig0 |
standard deviation used in calculating the significance probability of the maximum (minimum) of (absolute) standardized Mann-Whitney statistics, see Scholz (2016). |
sig |
vector
|
warning
method = "exact"
should only be used with caution.
Computation time is proportional to the number of enumerations.
Experiment with system.time
and trial values for
Nsim
to get a sense of the required computing time.
In most cases
dist = TRUE
should not be used, i.e.,
when the returned distribution objects
become too large for R's work space.
References
Knuth, D.E. (2011), The Art of Computer Programming, Volume 4A Combinatorial Algorithms Part 1, Addison-Wesley
Lehmann, E.L. (2006), Nonparametrics, Statistical Methods Based on Ranks, Revised First Edition, Springer Verlag.
Scholz, F.W. (2023), "On Steel's Test with Ties", https://arxiv.org/abs/2308.05873
Examples
z1 <- c(103, 111, 136, 106, 122, 114)
z2 <- c(119, 100, 97, 89, 112, 86)
z3 <- c( 89, 132, 86, 114, 114, 125)
z4 <- c( 92, 114, 86, 119, 131, 94)
y <- c(z1, z2, z3, z4)
g <- as.factor(c(rep(1, 6), rep(2, 6), rep(3, 6), rep(4, 6)))
set.seed(2627)
Steel.test(list(z1, z2, z3, z4), method = "simulated",
alternative = "less", Nsim = 1000)
# or with same seed
# Steel.test(z1, z2, z3, z4,method = "simulated",
# alternative = "less", Nsim = 1000)
# or with same seed
# Steel.test(y ~ g, method = "simulated",
# alternative = "less", Nsim=1000)