R: Generalized Maximally Selected Statistics

MaximallySelectedStatisticsTests {coin}

R Documentation

Generalized Maximally Selected Statistics

Description

Testing the independence of two sets of variables measured on arbitrary scales against cutpoint alternatives.

Usage

## S3 method for class 'formula'
maxstat_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'table'
maxstat_test(object, ...)
## S3 method for class 'IndependenceProblem'
maxstat_test(object, teststat = c("maximum", "quadratic"),
             distribution = c("asymptotic", "approximate", "none"),
             minprob = 0.1, maxprob = 1 - minprob, ...)

Arguments

`formula`	a formula of the form `y1 + ... + yq ~ x1 + ... + xp \| block` where `y1`, ..., `yq` and `x1`, ..., `xp` are measured on arbitrary scales (nominal, ordinal or continuous with or without censoring) and `block` is an optional factor for stratification.
`data`	an optional data frame containing the variables in the model formula.
`subset`	an optional vector specifying a subset of observations to be used. Defaults to `NULL`.
`weights`	an optional formula of the form `~ w` defining integer valued case weights for each observation. Defaults to `NULL`, implying equal weight for all observations.
`object`	an object inheriting from classes `"table"` or `"IndependenceProblem"`.
`teststat`	a character, the type of test statistic to be applied: either a maximum statistic (`"maximum"`, default) or a quadratic form (`"quadratic"`).
`distribution`	a character, the conditional null distribution of the test statistic can be approximated by its asymptotic distribution (`"asymptotic"`, default) or via Monte Carlo resampling (`"approximate"`). Alternatively, the functions `asymptotic` or `approximate` can be used. Computation of the null distribution can be suppressed by specifying `"none"`.
`minprob`	a numeric, a fraction between 0 and 0.5 specifying that cutpoints only greater than the `minprob` `\cdot` 100% quantile of `x1`, ..., `xp` are considered. Defaults to `0.1`.
`maxprob`	a numeric, a fraction between 0.5 and 1 specifying that cutpoints only smaller than the `maxprob` `\cdot` 100% quantile of `x1`, ..., `xp` are considered. Defaults to `1 - minprob`.
`...`	further arguments to be passed to `independence_test()`.

Details

maxstat_test() provides generalized maximally selected statistics. The family of maximally selected statistics encompasses a large collection of procedures used for the estimation of simple cutpoint models including, but not limited to, maximally selected \chi^2 statistics, maximally selected Cochran-Armitage statistics, maximally selected rank statistics and maximally selected statistics for multiple covariates. A general description of these methods is given by Hothorn and Zeileis (2008).

The null hypothesis of independence, or conditional independence given block, between y1, ..., yq and x1, ..., xp is tested against cutpoint alternatives. All possible partitions into two groups are evaluated for each unordered covariate x1, ..., xp, whereas only order-preserving binary partitions are evaluated for ordered or numeric covariates. The cutpoint is then a set of levels defining one of the two groups.

If both response and covariate is univariable, say y1 and x1, this procedure is known as maximally selected \chi^2 statistics (Miller and Siegmund, 1982) when y1 is a binary factor and x1 is a numeric variable, and as maximally selected rank statistics when y1 is a rank transformed numeric variable and x1 is a numeric variable (Lausen and Schumacher, 1992). Lausen et al. (2004) introduced maximally selected statistics for a univariable numeric response and multiple numeric covariates x1, ..., xp.

If, say, y1 and/or x1 are ordered factors, the default scores, 1:nlevels(y1) and 1:nlevels(x1), respectively, can be altered using the scores argument (see independence_test()); this argument can also be used to coerce nominal factors to class "ordered". If both, say, y1 and x1 are ordered factors, a linear-by-linear association test is computed and the direction of the alternative hypothesis can be specified using the alternative argument. The particular extension to the case of a univariable ordered response and a univariable numeric covariate was given by Betensky and Rabinowitz (1999) and is known as maximally selected Cochran-Armitage statistics.

The conditional null distribution of the test statistic is used to obtain p-values and an asymptotic approximation of the exact distribution is used by default (distribution = "asymptotic"). Alternatively, the distribution can be approximated via Monte Carlo resampling by setting distribution to "approximate". See asymptotic() and approximate() for details.

Value

An object inheriting from class "IndependenceTest".

Note

Starting with coin version 1.1-0, maximum statistics and quadratic forms can no longer be specified using teststat = "maxtype" and teststat = "quadtype", respectively (as was used in versions prior to 0.4-5).

References

Betensky, R. A. and Rabinowitz, D. (1999). Maximally selected \chi^2 statistics for k \times 2 tables. Biometrics 55(1), 317–320. doi:10.1111/j.0006-341X.1999.00317.x

Hothorn, T. and Lausen, B. (2003). On the exact distribution of maximally selected rank statistics. Computational Statistics & Data Analysis 43(2), 121–137. doi:10.1016/S0167-9473(02)00225-6

Hothorn, T. and Zeileis, A. (2008). Generalized maximally selected statistics. Biometrics 64(4), 1263–1269. doi:10.1111/j.1541-0420.2008.00995.x

Lausen, B., Hothorn, T., Bretz, F. and Schumacher, M. (2004). Assessment of optimal selected prognostic factors. Biometrical Journal 46(3), 364–374. doi:10.1002/bimj.200310030

Lausen, B. and Schumacher, M. (1992). Maximally selected rank statistics. Biometrics 48(1), 73–85. doi:10.2307/2532740

Miller, R. and Siegmund, D. (1982). Maximally selected chi square statistics. Biometrics 38(4), 1011–1016. doi:10.2307/2529881

Müller, J. and Hothorn, T. (2004). Maximally selected two-sample statistics as a new tool for the identification and assessment of habitat factors with an application to breeding bird communities in oak forests. European Journal of Forest Research 123(3), 219–228. doi:10.1007/s10342-004-0035-5

Examples


## Tree pipit data (Mueller and Hothorn, 2004)
## Asymptotic maximally selected statistics
maxstat_test(counts ~ coverstorey, data = treepipit)

## Asymptotic maximally selected statistics
## Note: all covariates simultaneously
mt <- maxstat_test(counts ~ ., data = treepipit)
mt@estimates$estimate


## Malignant arrythmias data (Hothorn and Lausen, 2003, Sec. 7.2)
## Asymptotic maximally selected statistics
maxstat_test(Surv(time, event) ~  EF, data = hohnloser,
             ytrafo = function(data)
                 trafo(data, surv_trafo = function(y)
                     logrank_trafo(y, ties.method = "Hothorn-Lausen")))


## Breast cancer data (Hothorn and Lausen, 2003, Sec. 7.3)
## Asymptotic maximally selected statistics
data("sphase", package = "TH.data")
maxstat_test(Surv(RFS, event) ~  SPF, data = sphase,
             ytrafo = function(data)
                 trafo(data, surv_trafo = function(y)
                     logrank_trafo(y, ties.method = "Hothorn-Lausen")))


## Job satisfaction data (Agresti, 2002, p. 288, Tab. 7.8)
## Asymptotic maximally selected statistics
maxstat_test(jobsatisfaction)

## Asymptotic maximally selected statistics
## Note: 'Job.Satisfaction' and 'Income' as ordinal
maxstat_test(jobsatisfaction,
             scores = list("Job.Satisfaction" = 1:4,
                           "Income" = 1:4))

[Package coin version 1.4-3 Index]