SteelConfInt {kSamples} | R Documentation |
Simultaneous Confidence Bounds Based on Steel's Multiple Comparison Wilcoxon Tests
Description
This function inverts pairwise Wilcoxon tests, comparing a common control sample with each of several treatment samples to provide simultaneous confidence bounds for the respective shift parameters by which the sampled treatment populations may differ from the control population. It is assumed that all samples are independent and that the sampled distributions are continuous to avoid ties. The joint coverage probability for all bounds/intervals is calculated, estimated, or approximated, see Details. For treatment of ties also see Details.
Usage
SteelConfInt(..., data = NULL, conf.level = 0.95,
alternative = c("less", "greater", "two.sided"),
method = c("asymptotic", "exact", "simulated"), Nsim = 10000)
Arguments
... |
Either several sample vectors, say
or a list of such sample vectors. or a formula y ~ g, where y contains the pooled sample values and g (same length as y) is a factor with levels identifying the samples to which the elements of y belong. The lowest factor level corresponds to the control sample, the other levels to treatment samples. |
data |
= an optional data frame providing the variables in formula y ~ g. |
conf.level |
|
alternative |
=
|
method |
=
|
Nsim |
|
Details
The first sample is treated as control sample with sample size . The remaining
samples are treatment samples.
Let
denote the respective Wilcoxon statistics comparing the common control sample (index 1)
with each of the
treatment samples (indexed by
).
For each comparison of control and treatment
sample
only the observations of the two samples involved are ranked.
By
we denote
the corresponding Mann-Whitney test statistic.
Furthermore, let
denote the
-th ordered value (ascending order) of the
paired differences between the observations in treatment sample
and those of the control
sample. By simple extension of results in Lehmann (2006), pages 87 and 92, the following equations hold,
relating the null distribution of the
Mann-Whitney statistics and the joint coverage probabilities of the
for any set of
with
.
and
where refers to the distribution under
and
refers to the joint null distribution of the
when all sampled
distributions are the same and continuous. There are
indices
that can be manipulated
to affect the achieved confidence level. To limit the computational complexity
standardized versions of the
, i.e.,
with
and
representing mean and standard deviation of
,
are used to choose a common value for
(satisfying the
level) from the multivariate normal approximation
for the
(see Miller (1981) and Scholz (2016)), and reduce that
to integer values for
, rounding up, rounding down, and rounding to the nearest integer. These
integers
are then used in approximating the actual joint probabilities
, and from these three coverage probabilities
the one that is closest to the nominal confidence level
and
and also also the one that is closest without the restriction
are chosen.
When method = "exact"
or = "simulated"
is specified, the same process
is used, using either the fully enumerated exact distribution of (based on a recursive
version of Chase's sequence as presented in Knuth (2011)) for all sample splits,
or the simulated distribution of
. However, since these distributions are discrete
the starting point before rounding up is the smallest quantile such that the proportion of distribution values less
or equal to it is at least
. The starting point before rounding down is the highest quantile such that
the proportion of distribution values less
or equal to it is at most
. The third option of rounding to the closest integer is performed using
the average of the first two.
Confidence intervals are constructed by using upper and lower confidence bounds, each with
same confidence level of .
When the original sample data appear to be rounded, and especially when there are ties,
one should widen the computed intervals or bounds by the rounding , as illustrated
in Lehmann (2006), pages 85 and 94. For example, when all sample values appear to end in one of
,
the rounding
would be
. Ultimately, this is a judgment call for the user. Such widening
of intervals will make the actually achieved confidence level
the stated achieved level.
Value
A list of class kSamples
with components
test.name |
|
n1 |
the control sample size |
ns |
vector |
N |
size of the pooled sample |
n.ties |
number of ties in the pooled sample |
bounds |
a list of data frames. When In case of In the case of When
In either case the structure and meaning
of these data frames parallels that of the |
method |
the |
Nsim |
the number of simulations used. |
j.LU |
an |
warning
method = "exact"
should only be used with caution.
Computation time is proportional to the number of enumerations.
Experiment with system.time
and trial values for
Nsim
to get a sense of the required computing time.
References
Knuth, D.E. (2011), The Art of Computer Programming, Volume 4A Combinatorial Algorithms Part 1, Addison-Wesley
Lehmann, E.L. (2006), Nonparametrics, Statistical Methods Based on Ranks, Revised First Edition, Springer Verlag.
Miller, Rupert G., Jr. (1981), Simultaneous Statistical Inference, Second Edition, Springer Verlag, New York.
Scholz, F.W. (2023), "On Steel's Test with Ties", https://arxiv.org/abs/2308.05873
Examples
z1 <- c(103, 111, 136, 106, 122, 114)
z2 <- c(119, 100, 97, 89, 112, 86)
z3 <- c( 89, 132, 86, 114, 114, 125)
z4 <- c( 92, 114, 86, 119, 131, 94)
set.seed(2627)
SteelConfInt(list(z1,z2,z3,z4),conf.level=0.95,alternative="two.sided",
method="simulated",Nsim=10000)
# or with same seed
# SteelConfInt(z1,z2,z3,z4,conf.level=0.95,alternative="two.sided",
# method="simulated",Nsim=10000)