ad.test {kSamples} | R Documentation |
Anderson-Darling k-Sample Test
Description
This function uses the Anderson-Darling criterion to test
the hypothesis that k
independent samples with sample sizes
n_1,\ldots, n_k
arose
from a common unspecified distribution function F(x)
and testing is
done conditionally given the observed tie pattern. Thus this is a permutation test.
Both versions of the AD
statistic are computed.
Usage
ad.test(..., data = NULL, method = c("asymptotic", "simulated", "exact"),
dist = FALSE, Nsim = 10000)
Arguments
... |
Either several sample vectors, say
or a list of such sample vectors, or a formula y ~ g, where y contains the pooled sample values and g is a factor (of same length as y) with levels identifying the samples to which the elements of y belong. |
data |
= an optional data frame providing the variables in formula y ~ g. |
method |
=
of full enumerations. Otherwise, |
dist |
|
Nsim |
|
Details
If AD
is the Anderson-Darling criterion for the k
samples,
its standardized test statistic is T.AD = (AD - \mu)/\sigma
, with
\mu = k-1
and
\sigma
representing mean and standard deviation of AD
. This statistic
is used to test the hypothesis that the samples all come
from the same but unspecified continuous distribution function F(x)
.
According to the reference article, two versions
of the AD
test statistic are provided.
The above mean and standard deviation are strictly
valid only for version 1 in the
continuous distribution case.
NA values are removed and the user is alerted with the total NA count. It is up to the user to judge whether the removal of NA's is appropriate.
The continuity assumption can be dispensed with, if we deal with
independent random samples, or if randomization was used in allocating
subjects to samples or treatments, and if we view
the simulated or exact P
-values conditionally, given the tie pattern
in the pooled samples. Of course, under such randomization any conclusions
are valid only with respect to the group of subjects that were randomly allocated
to their respective samples.
The asymptotic P
-value calculation assumes distribution continuity. No adjustment
for lack thereof is known at this point. For details on the asymptotic
P
-value calculation see ad.pval
.
Value
A list of class kSamples
with components
test.name |
|
k |
number of samples being compared |
ns |
vector of the |
N |
size of the pooled sample |
n.ties |
number of ties in the pooled samples |
sig |
standard deviations |
ad |
2 x 3 (2 x 4) matrix containing |
warning |
logical indicator, warning = TRUE when at least one
|
null.dist1 |
simulated or enumerated null distribution of version 1
of the test statistic, given as vector of all generated |
null.dist2 |
simulated or enumerated null distribution of version 2
of the test statistic, given as vector of all generated |
method |
The |
Nsim |
The number of simulations. |
warning
method = "exact"
should only be used with caution.
Computation time is proportional to the number of enumerations. In most cases
dist = TRUE
should not be used, i.e.,
when the returned distribution vectors null.dist1
and null.dist2
become too large for the R work space. These vectors are limited in length by 1e8.
Note
For small sample sizes and small k
exact null distribution
calculations are possible (with or without ties), based on a recursively extended
version of Algorithm C (Chase's sequence) in Knuth (2011), Ch. 7.2.1.3, which allows the
enumeration of all possible splits of the pooled data into samples of
sizes of n_1, \ldots, n_k
, as appropriate under treatment randomization. The
enumeration and simulation are both done in C.
Note
It has recently come to our attention that the Anderson-Darling test, originally proposed by Pettitt (1976) in the 2-sample case and generalized to k samples by Scholz and Stephens, has a close relative created by Baumgartner et al (1998) in the 2 sample case and populatized by Neuhaeuser (2012) with at least 6 papers among his cited references and generalized by Murakami (2006) to k samples.
References
Baumgartner, W., Weiss, P. and Schindler, H. (1998), A nonparametric test for the general two-sample problem, Bionetrics, 54, 1129-1135.
Knuth, D.E. (2011), The Art of Computer Programming, Volume 4A Combinatorial Algorithms Part 1, Addison-Wesley
Neuhaeuser, M. (2012), Nonparametric Statistical Tests, A Computational Approach, CRC Press.
Murakami, H. (2006), A k-sample rank test based on modified Baumgartner statistic and it power comparison, Jpn. Soc. Comp. Statist., 19, 1-13.
Murakami, H. (2012), Modified Baumgartner statistic for the two-sample and multisample problems: a numerical comparison. J. of Statistical Comput. and Simul., 82:5, 711-728.
Pettitt, A.N. (1976), A two-sample Anderson_Darling rank statistic, Biometrika, 63, 161-168.
Scholz, F. W. and Stephens, M. A. (1987), K-sample Anderson-Darling Tests, Journal of the American Statistical Association, Vol 82, No. 399, 918–924.
See Also
Examples
u1 <- c(1.0066, -0.9587, 0.3462, -0.2653, -1.3872)
u2 <- c(0.1005, 0.2252, 0.4810, 0.6992, 1.9289)
u3 <- c(-0.7019, -0.4083, -0.9936, -0.5439, -0.3921)
y <- c(u1, u2, u3)
g <- as.factor(c(rep(1, 5), rep(2, 5), rep(3, 5)))
set.seed(2627)
ad.test(u1, u2, u3, method = "exact", dist = FALSE, Nsim = 1000)
# or with same seed
# ad.test(list(u1, u2, u3), method = "exact", dist = FALSE, Nsim = 1000)
# or with same seed
# ad.test(y ~ g, method = "exact", dist = FALSE, Nsim = 1000)