R: Two- and K-Sample Scale Tests

ScaleTests {coin}

R Documentation

Two- and `K`-Sample Scale Tests

Description

Testing the equality of the distributions of a numeric response variable in two or more independent groups against scale alternatives.

Usage

## S3 method for class 'formula'
taha_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'IndependenceProblem'
taha_test(object, conf.int = FALSE, conf.level = 0.95, ...)

## S3 method for class 'formula'
klotz_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'IndependenceProblem'
klotz_test(object, ties.method = c("mid-ranks", "average-scores"),
           conf.int = FALSE, conf.level = 0.95, ...)

## S3 method for class 'formula'
mood_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'IndependenceProblem'
mood_test(object, ties.method = c("mid-ranks", "average-scores"),
          conf.int = FALSE, conf.level = 0.95, ...)

## S3 method for class 'formula'
ansari_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'IndependenceProblem'
ansari_test(object, ties.method = c("mid-ranks", "average-scores"),
            conf.int = FALSE, conf.level = 0.95, ...)

## S3 method for class 'formula'
fligner_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'IndependenceProblem'
fligner_test(object, ties.method = c("mid-ranks", "average-scores"),
             conf.int = FALSE, conf.level = 0.95, ...)

## S3 method for class 'formula'
conover_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'IndependenceProblem'
conover_test(object, conf.int = FALSE, conf.level = 0.95, ...)

Arguments

`formula`	a formula of the form `y ~ x \| block` where `y` is a numeric variable, `x` is a factor and `block` is an optional factor for stratification.
`data`	an optional data frame containing the variables in the model formula.
`subset`	an optional vector specifying a subset of observations to be used. Defaults to `NULL`.
`weights`	an optional formula of the form `~ w` defining integer valued case weights for each observation. Defaults to `NULL`, implying equal weight for all observations.
`object`	an object inheriting from class `"IndependenceProblem"`.
`conf.int`	a logical indicating whether a confidence interval for the ratio of scales should be computed. Defaults to `FALSE`.
`conf.level`	a numeric, confidence level of the interval. Defaults to `0.95`.
`ties.method`	a character, the method used to handle ties: the score generating function either uses mid-ranks (`"mid-ranks"`, default) or averages the scores of randomly broken ties (`"average-scores"`).
`...`	further arguments to be passed to `independence_test()`.

Details

taha_test(), klotz_test(), mood_test(), ansari_test(), fligner_test() and conover_test() provide the Taha test, the Klotz test, the Mood test, the Ansari-Bradley test, the Fligner-Killeen test and the Conover-Iman test. A general description of these methods is given by Hollander and Wolfe (1999). For the adjustment of scores for tied values see Hájek, Šidák and Sen (1999, pp. 133–135).

The null hypothesis of equality, or conditional equality given block, of the distribution of y in the groups defined by x is tested against scale alternatives. In the two-sample case, the two-sided null hypothesis is H_0\!: V(Y_1) / V(Y_2) = 1, where V(Y_s) is the variance of the responses in the sth sample. In case alternative = "less", the null hypothesis is H_0\!: V(Y_1) / V(Y_2) \ge 1. When alternative = "greater", the null hypothesis is H_0\!: V(Y_1) / V(Y_2) \le 1. Confidence intervals for the ratio of scales are available and computed according to Bauer (1972).

The Fligner-Killeen test uses median centering in each of the samples, as suggested by Conover, Johnson and Johnson (1981), whereas the Conover-Iman test, following Conover and Iman (1978), uses mean centering in each of the samples.

The conditional null distribution of the test statistic is used to obtain p-values and an asymptotic approximation of the exact distribution is used by default (distribution = "asymptotic"). Alternatively, the distribution can be approximated via Monte Carlo resampling or computed exactly for univariate two-sample problems by setting distribution to "approximate" or "exact", respectively. See asymptotic(), approximate() and exact() for details.

Value

An object inheriting from class "IndependenceTest". Confidence intervals can be extracted by confint().

Note

In the two-sample case, a large value of the Ansari-Bradley statistic indicates that sample 1 is less variable than sample 2, whereas a large value of the statistics due to Taha, Klotz, Mood, Fligner-Killeen, and Conover-Iman indicate that sample 1 is more variable than sample 2.

References

Bauer, D. F. (1972). Constructing confidence sets using rank statistics. Journal of the American Statistical Association 67(339), 687–690. doi:10.1080/01621459.1972.10481279

Conover, W. J. and Iman, R. L. (1978). Some exact tables for the squared ranks test. Communications in Statistics – Simulation and Computation 7(5), 491–513. doi:10.1080/03610917808812093

Conover, W. J., Johnson, M. E. and Johnson, M. M. (1981). A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data. Technometrics 23(4), 351–361. doi:10.1080/00401706.1981.10487680

Hájek, J., Šidák, Z. and Sen, P. K. (1999). Theory of Rank Tests, Second Edition. San Diego: Academic Press.

Hollander, M. and Wolfe, D. A. (1999). Nonparametric Statistical Methods, Second Edition. York: John Wiley & Sons.

Examples

## Serum Iron Determination Using Hyland Control Sera
## Hollander and Wolfe (1999, p. 147, Tab 5.1)
sid <- data.frame(
    serum = c(111, 107, 100, 99, 102, 106, 109, 108, 104, 99,
              101, 96, 97, 102, 107, 113, 116, 113, 110, 98,
              107, 108, 106, 98, 105, 103, 110, 105, 104,
              100, 96, 108, 103, 104, 114, 114, 113, 108, 106, 99),
    method = gl(2, 20, labels = c("Ramsay", "Jung-Parekh"))
)

## Asymptotic Ansari-Bradley test
ansari_test(serum ~ method, data = sid)

## Exact Ansari-Bradley test
pvalue(ansari_test(serum ~ method, data = sid,
                   distribution = "exact"))


## Platelet Counts of Newborn Infants
## Hollander and Wolfe (1999, p. 171, Tab. 5.4)
platelet <- data.frame(
    counts = c(120, 124, 215, 90, 67, 95, 190, 180, 135, 399,
               12, 20, 112, 32, 60, 40),
    treatment = factor(rep(c("Prednisone", "Control"), c(10, 6)))
)

## Approximative (Monte Carlo) Lepage test
## Hollander and Wolfe (1999, p. 172)
lepage_trafo <- function(y)
    cbind("Location" = rank_trafo(y), "Scale" = ansari_trafo(y))

independence_test(counts ~ treatment, data = platelet,
                  distribution = approximate(nresample = 10000),
                  ytrafo = function(data)
                      trafo(data, numeric_trafo = lepage_trafo),
                  teststat = "quadratic")

## Why was the null hypothesis rejected?
## Note: maximum statistic instead of quadratic form
ltm <- independence_test(counts ~ treatment, data = platelet,
                         distribution = approximate(nresample = 10000),
                         ytrafo = function(data)
                             trafo(data, numeric_trafo = lepage_trafo))

## Step-down adjustment suggests a difference in location
pvalue(ltm, method = "step-down")

## The same results are obtained from the simple Sidak-Holm procedure since the
## correlation between Wilcoxon and Ansari-Bradley test statistics is zero
cov2cor(covariance(ltm))
pvalue(ltm, method = "step-down", distribution = "marginal", type = "Sidak")

[Package coin version 1.4-3 Index]