ad.test {kSamples}R Documentation

Anderson-Darling k-Sample Test


This function uses the Anderson-Darling criterion to test the hypothesis that kk independent samples with sample sizes n1,,nkn_1,\ldots, n_k arose from a common unspecified distribution function F(x)F(x) and testing is done conditionally given the observed tie pattern. Thus this is a permutation test. Both versions of the ADAD statistic are computed.


ad.test(..., data = NULL, method = c("asymptotic", "simulated", "exact"),
	dist = FALSE, Nsim = 10000)



Either several sample vectors, say x1,,xkx_1, \ldots, x_k, with xix_i containing nin_i sample values. ni>4n_i > 4 is recommended for reasonable asymptotic PP-value calculation. The pooled sample size is denoted by N=n1++nkN=n_1+\ldots+n_k,

or a list of such sample vectors,

or a formula y ~ g, where y contains the pooled sample values and g is a factor (of same length as y) with levels identifying the samples to which the elements of y belong.


= an optional data frame providing the variables in formula y ~ g.


= c("asymptotic","simulated","exact"), where

"asymptotic" uses only an asymptotic PP-value approximation, reasonable for P in [.00001, .99999] when all ni>4n_i > 4. Linear extrapolation via log(P/(1P))\log(P/(1-P)) is used outside [.00001, .99999]. This calculation is always done. See ad.pval for details. The adequacy of the asymptotic PP-value calculation may be checked using pp.kSamples.

"simulated" uses Nsim simulated ADAD statistics, based on random splits of the pooled samples into samples of sizes n1,,nkn_1, \ldots, n_k, to estimate the exact conditional PP-value.

"exact" uses full enumeration of all sample splits with resulting ADAD statistics to obtain the exact conditional PP-values. It is used only when Nsim is at least as large as the number

ncomb=N!n1!nk!ncomb = \frac{N!}{n_1!\ldots n_k!}

of full enumerations. Otherwise, method reverts to "simulated" using the given Nsim. It also reverts to "simulated" when ncomb>1e8ncomb > 1e8 and dist = TRUE.


= FALSE (default) or TRUE. If TRUE, the simulated or fully enumerated distribution vectors null.dist1 and null.dist2 are returned for the respective test statistic versions. Otherwise, NULL is returned. When dist = TRUE then Nsim <- min(Nsim, 1e8), to limit object size.


= 10000 (default), number of simulation sample splits to use. It is only used when method = "simulated", or when method = "exact" reverts to method = "simulated", as previously explained.


If ADAD is the Anderson-Darling criterion for the kk samples, its standardized test statistic is T.AD=(ADμ)/σT.AD = (AD - \mu)/\sigma, with μ=k1\mu = k-1 and σ\sigma representing mean and standard deviation of ADAD. This statistic is used to test the hypothesis that the samples all come from the same but unspecified continuous distribution function F(x)F(x).

According to the reference article, two versions of the ADAD test statistic are provided. The above mean and standard deviation are strictly valid only for version 1 in the continuous distribution case.

NA values are removed and the user is alerted with the total NA count. It is up to the user to judge whether the removal of NA's is appropriate.

The continuity assumption can be dispensed with, if we deal with independent random samples, or if randomization was used in allocating subjects to samples or treatments, and if we view the simulated or exact PP-values conditionally, given the tie pattern in the pooled samples. Of course, under such randomization any conclusions are valid only with respect to the group of subjects that were randomly allocated to their respective samples. The asymptotic PP-value calculation assumes distribution continuity. No adjustment for lack thereof is known at this point. For details on the asymptotic PP-value calculation see ad.pval.


A list of class kSamples with components



number of samples being compared


vector of the kk sample sizes (n1,,nk)(n_1,\ldots,n_k)


size of the pooled sample =n1++nk= n_1+\ldots+n_k


number of ties in the pooled samples


standard deviations σ\sigma of version 1 of ADAD under the continuity assumption


2 x 3 (2 x 4) matrix containing AD,T.ADAD, T.AD, asymptotic PP-value, (simulated or exact PP-value), for each version of the standardized test statistic TT, version 1 in row 1, version 2 in row 2.


logical indicator, warning = TRUE when at least one ni<5n_i < 5


simulated or enumerated null distribution of version 1 of the test statistic, given as vector of all generated ADAD statistics.


simulated or enumerated null distribution of version 2 of the test statistic, given as vector of all generated ADAD statistics.


The method used.


The number of simulations.


method = "exact" should only be used with caution. Computation time is proportional to the number of enumerations. In most cases dist = TRUE should not be used, i.e., when the returned distribution vectors null.dist1 and null.dist2 become too large for the R work space. These vectors are limited in length by 1e8.


For small sample sizes and small kk exact null distribution calculations are possible (with or without ties), based on a recursively extended version of Algorithm C (Chase's sequence) in Knuth (2011), Ch., which allows the enumeration of all possible splits of the pooled data into samples of sizes of n1,,nkn_1, \ldots, n_k, as appropriate under treatment randomization. The enumeration and simulation are both done in C.


It has recently come to our attention that the Anderson-Darling test, originally proposed by Pettitt (1976) in the 2-sample case and generalized to k samples by Scholz and Stephens, has a close relative created by Baumgartner et al (1998) in the 2 sample case and populatized by Neuhaeuser (2012) with at least 6 papers among his cited references and generalized by Murakami (2006) to k samples.


Baumgartner, W., Weiss, P. and Schindler, H. (1998), A nonparametric test for the general two-sample problem, Bionetrics, 54, 1129-1135.

Knuth, D.E. (2011), The Art of Computer Programming, Volume 4A Combinatorial Algorithms Part 1, Addison-Wesley

Neuhaeuser, M. (2012), Nonparametric Statistical Tests, A Computational Approach, CRC Press.

Murakami, H. (2006), A k-sample rank test based on modified Baumgartner statistic and it power comparison, Jpn. Soc. Comp. Statist., 19, 1-13.

Murakami, H. (2012), Modified Baumgartner statistic for the two-sample and multisample problems: a numerical comparison. J. of Statistical Comput. and Simul., 82:5, 711-728.

Pettitt, A.N. (1976), A two-sample Anderson_Darling rank statistic, Biometrika, 63, 161-168.

Scholz, F. W. and Stephens, M. A. (1987), K-sample Anderson-Darling Tests, Journal of the American Statistical Association, Vol 82, No. 399, 918–924.

See Also

ad.test.combined, ad.pval


u1 <- c(1.0066, -0.9587,  0.3462, -0.2653, -1.3872)
u2 <- c(0.1005, 0.2252, 0.4810, 0.6992, 1.9289)
u3 <- c(-0.7019, -0.4083, -0.9936, -0.5439, -0.3921)
y <- c(u1, u2, u3)
g <- as.factor(c(rep(1, 5), rep(2, 5), rep(3, 5)))
ad.test(u1, u2, u3, method = "exact", dist = FALSE, Nsim = 1000)
# or with same seed
# ad.test(list(u1, u2, u3), method = "exact", dist = FALSE, Nsim = 1000)
# or with same seed
# ad.test(y ~ g, method = "exact", dist = FALSE, Nsim = 1000)

[Package kSamples version 1.2-10 Index]