ad.test.combined {kSamples}R Documentation

Combined Anderson-Darling k-Sample Tests

Description

This function combines several independent Anderson-Darling kk-sample tests into one overall test of the hypothesis that the independent samples within each block come from a common unspecified distribution, while the common distributions may vary from block to block. Both versions of the Anderson-Darling test statistic are provided.

Usage

ad.test.combined(..., data = NULL,
	method = c("asymptotic", "simulated", "exact"),
	dist = FALSE, Nsim = 10000)

Arguments

...

Either a sequence of several lists, say L1,,LML_1, \ldots, L_M (M>1M > 1) where list LiL_i contains ki>1k_i > 1 sample vectors of respective sizes ni1,,nikin_{i1}, \ldots, n_{ik_i}, where nij>4n_{ij} > 4 is recommended for reasonable asymptotic PP-value calculation. Ni=ni1++nikiN_i=n_{i1}+\ldots+n_{ik_i} is the pooled sample size for block ii,

or a list of such lists,

or a formula, like y ~ g | b, where y is a numeric response vector, g is a factor with levels indicating different treatments and b is a factor indicating different blocks; y, g, b are or equal length. y is split separately for each block level into separate samples according to the g levels. The same g level may occur in different blocks. The variable names may correspond to variables in an optionally supplied data frame via the data = argument,

data

= an optional data frame providing the variables in formula input

method

= c("asymptotic","simulated","exact"), where

"asymptotic" uses only an asymptotic PP-value approximation, reasonable for P in [0.00001, .99999], linearly extrapolated via log(P/(1P))\log(P/(1-P)) outside that range. See ad.pval for details. The adequacy of the asymptotic PP-value calculation may be checked using pp.kSamples.

"simulated" uses simulation to get Nsim simulated ADAD statistics for each block of samples, adding them across blocks component wise to get Nsim combined values. These are compared with the observed combined value to obtain the estimated PP-value.

"exact" uses full enumeration of the test statistic values for all sample splits of the pooled samples within each block. The test statistic vectors for the first 2 blocks are added (each component against each component, as in the R outer(x,y, "+") command) to get the convolution enumeration for the combined test statistic. The resulting vector is convoluted against the next block vector in the same fashion, and so on. It is possible only for small problems, and is attempted only when Nsim is at least the (conservatively maximal) length

N1!n11!n1k1!××NM!nM1!nMkM!\frac{N_1!}{n_{11}!\ldots n_{1k_1}!}\times\ldots\times \frac{N_M!}{n_{M1}!\ldots n_{Mk_M}!}

of the final distribution vector. Otherwise, it reverts to the simulation method using the provided Nsim.

dist

FALSE (default) or TRUE. If TRUE, the simulated or fully enumerated convolution vectors null.dist1 and null.dist2 are returned for the respective test statistic versions. Otherwise, NULL is returned for each.

Nsim

= 10000 (default), number of simulation splits to use within each block of samples. It is only used when method = "simulated" or when method = "exact" reverts to method = "simulated", as previously explained. Simulations are independent across blocks, using Nsim for each block. Nsim is limited by 1e7.

Details

If ADiAD_i is the Anderson-Darling criterion for the i-th block of kik_i samples, its standardized test statistic is Ti=(ADiμi)/σiT_i = (AD_i - \mu_i)/\sigma_i, with μi\mu_i and σi\sigma_i representing mean and standard deviation of ADiAD_i. This statistic is used to test the hypothesis that the samples in the i-th block all come from the same but unspecified continuous distribution function Fi(x)F_i(x).

The combined Anderson-Darling criterion is ADcomb=AD1++ADMAD_{comb}=AD_1 + \ldots + AD_M and Tcomb=T_{comb} = (ADcombμc)/σc(AD_{comb} - \mu_c)/\sigma_c is the standardized form, where μc=μ1++μM\mu_c=\mu_1+\ldots+\mu_M and σc=σ12++σM2\sigma_c = \sqrt{\sigma_1^2 +\ldots+\sigma_M^2} represent the mean and standard deviation of ADcombAD_{comb}. The statistic TcombT_{comb} is used to simultaneously test whether the samples in each block come from the same continuous distribution function Fi(x),i=1,,MF_i(x), i=1,\ldots,M. The unspecified common distribution function Fi(x)F_i(x) may change from block to block. According to the reference article, two versions of the test statistic and its corresponding combinations are provided.

The kik_i for each block of kik_i independent samples may change from block to block.

NA values are removed and the user is alerted with the total NA count. It is up to the user to judge whether the removal of NA's is appropriate.

The continuity assumption can be dispensed with if we deal with independent random samples, or if randomization was used in allocating subjects to samples or treatments, independently from block to block, and if we view the simulated or exact PP-values conditionally, given the tie patterns within each block. Of course, under such randomization any conclusions are valid only with respect to the blocks of subjects that were randomly allocated. The asymptotic PP-value calculation assumes distribution continuity. No adjustment for lack thereof is known at this point. The same comment holds for the means and standard deviations of respective statistics.

Value

A list of class kSamples with components

test.name

== "Anderson-Darling"

M

number of blocks of samples being compared

n.samples

list of M vectors, each vector giving the sample sizes for each block of samples being compared

nt

=(N1,,NM)= (N_1,\ldots,N_M)

n.ties

vector giving the number of ties in each the M comparison blocks

ad.list

list of M matrices giving the ad results for ad.test applied to the samples in each of the M blocks

mu

vector of means of the ADAD statistic for the M blocks

sig

vector of standard deviations of the ADAD statistic for the M blocks

ad.c

2 x 3 (2 x 4) matrix containing ADcomb,TcombAD_{comb}, T_{comb}, asymptotic PP-value, (simulated or exact PP-value), for each version of the combined test statistic, version 1 in row 1 and version 2 in row 2

mu.c

mean of ADcombAD_{comb}

sig.c

standard deviation of ADcombAD_{comb}

warning

logical indicator, warning = TRUE when at least one nij<5n_{ij} < 5

null.dist1

simulated or enumerated null distribution of version 1 of ADcombAD_{comb}

null.dist2

simulated or enumerated null distribution of version 2 of ADcombAD_{comb}

method

the method used.

Nsim

the number of simulations used for each block of samples.

Note

This test is useful in analyzing treatment effects in randomized (incomplete) block experiments and in examining performance equivalence of several laboratories when presented with different test materials for comparison.

References

Scholz, F. W. and Stephens, M. A. (1987), K-sample Anderson-Darling Tests, Journal of the American Statistical Association, Vol 82, No. 399, 918–924.

See Also

ad.test, ad.pval

Examples

## Create two lists of sample vectors.
x1 <- list( c(1, 3, 2, 5, 7), c(2, 8, 1, 6, 9, 4), c(12, 5, 7, 9, 11) )
x2 <- list( c(51, 43, 31, 53, 21, 75), c(23, 45, 61, 17, 60) )
# and a corresponding data frame datx1x2
x1x2 <- c(unlist(x1),unlist(x2))
gx1x2 <- as.factor(c(rep(1,5),rep(2,6),rep(3,5),rep(1,6),rep(2,5)))
bx1x2 <- as.factor(c(rep(1,16),rep(2,11)))
datx1x2 <- data.frame(A = x1x2, G = gx1x2, B = bx1x2)

## Run ad.test.combined.
set.seed(2627)
ad.test.combined(x1, x2, method = "simulated", Nsim = 1000) 
# or with same seed
# ad.test.combined(list(x1, x2), method = "simulated", Nsim = 1000)
# ad.test.combined(A~G|B,data=datx1x2,method="simulated",Nsim=1000)



[Package kSamples version 1.2-10 Index]