MaximallySelectedStatisticsTests {coin} R Documentation

## Generalized Maximally Selected Statistics

### Description

Testing the independence of two sets of variables measured on arbitrary scales against cutpoint alternatives.

### Usage

```## S3 method for class 'formula'
maxstat_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'table'
maxstat_test(object, ...)
## S3 method for class 'IndependenceProblem'
distribution = c("asymptotic", "approximate", "none"),
minprob = 0.1, maxprob = 1 - minprob, ...)
```

### Arguments

 `formula` a formula of the form `y1 + ... + yq ~ x1 + ... + xp | block` where `y1`, ..., `yq` and `x1`, ..., `xp` are measured on arbitrary scales (nominal, ordinal or continuous with or without censoring) and `block` is an optional factor for stratification. `data` an optional data frame containing the variables in the model formula. `subset` an optional vector specifying a subset of observations to be used. Defaults to `NULL`. `weights` an optional formula of the form `~ w` defining integer valued case weights for each observation. Defaults to `NULL`, implying equal weight for all observations. `object` an object inheriting from classes `"table"` or `"IndependenceProblem"`. `teststat` a character, the type of test statistic to be applied: either a maximum statistic (`"maximum"`, default) or a quadratic form (`"quadratic"`). `distribution` a character, the conditional null distribution of the test statistic can be approximated by its asymptotic distribution (`"asymptotic"`, default) or via Monte Carlo resampling (`"approximate"`). Alternatively, the functions `asymptotic` or `approximate` can be used. Computation of the null distribution can be suppressed by specifying `"none"`. `minprob` a numeric, a fraction between 0 and 0.5 specifying that cutpoints only greater than the `minprob` * 100% quantile of `x1`, ..., `xp` are considered. Defaults to `0.1`. `maxprob` a numeric, a fraction between 0.5 and 1 specifying that cutpoints only smaller than the `maxprob` * 100% quantile of `x1`, ..., `xp` are considered. Defaults to `1 - minprob`. `...` further arguments to be passed to `independence_test`.

### Details

`maxstat_test` provides generalized maximally selected statistics. The family of maximally selected statistics encompasses a large collection of procedures used for the estimation of simple cutpoint models including, but not limited to, maximally selected chi^2 statistics, maximally selected Cochran-Armitage statistics, maximally selected rank statistics and maximally selected statistics for multiple covariates. A general description of these methods is given by Hothorn and Zeileis (2008).

The null hypothesis of independence, or conditional independence given `block`, between `y1`, ..., `yq` and `x1`, ..., `xp` is tested against cutpoint alternatives. All possible partitions into two groups are evaluated for each unordered covariate `x1`, ..., `xp`, whereas only order-preserving binary partitions are evaluated for ordered or numeric covariates. The cutpoint is then a set of levels defining one of the two groups.

If both response and covariate is univariable, say `y1` and `x1`, this procedure is known as maximally selected chi^2 statistics (Miller and Siegmund, 1982) when `y1` is a binary factor and `x1` is a numeric variable, and as maximally selected rank statistics when `y1` is a rank transformed numeric variable and `x1` is a numeric variable (Lausen and Schumacher, 1992). Lausen et al. (2004) introduced maximally selected statistics for a univariable numeric response and multiple numeric covariates `x1`, ..., `xp`.

If, say, `y1` and/or `x1` are ordered factors, the default scores, `1:nlevels(y1)` and `1:nlevels(x1)` respectively, can be altered using the `scores` argument (see `independence_test`); this argument can also be used to coerce nominal factors to class `"ordered"`. If both, say, `y1` and `x1` are ordered factors, a linear-by-linear association test is computed and the direction of the alternative hypothesis can be specified using the `alternative` argument. The particular extension to the case of a univariable ordered response and a univariable numeric covariate was given by Betensky and Rabinowitz (1999) and is known as maximally selected Cochran-Armitage statistics.

The conditional null distribution of the test statistic is used to obtain p-values and an asymptotic approximation of the exact distribution is used by default (`distribution = "asymptotic"`). Alternatively, the distribution can be approximated via Monte Carlo resampling by setting `distribution` to `"approximate"`. See `asymptotic` and `approximate` for details.

### Value

An object inheriting from class `"IndependenceTest"`.

### Note

Starting with coin version 1.1-0, maximum statistics and quadratic forms can no longer be specified using `teststat = "maxtype"` and `teststat = "quadtype"` respectively (as was used in versions prior to 0.4-5).

### References

Betensky, R. A. and Rabinowitz, D. (1999). Maximally selected chi^2 statistics for k x 2 tables. Biometrics 55(1), 317–320. doi: 10.1111/j.0006-341X.1999.00317.x

Hothorn, T. and Lausen, B. (2003). On the exact distribution of maximally selected rank statistics. Computational Statistics & Data Analysis 43(2), 121–137. doi: 10.1016/S0167-9473(02)00225-6

Hothorn, T. and Zeileis, A. (2008). Generalized maximally selected statistics. Biometrics 64(4), 1263–1269. doi: 10.1111/j.1541-0420.2008.00995.x

Lausen, B., Hothorn, T., Bretz, F. and Schumacher, M. (2004). Assessment of optimal selected prognostic factors. Biometrical Journal 46(3), 364–374. doi: 10.1002/bimj.200310030

Lausen, B. and Schumacher, M. (1992). Maximally selected rank statistics. Biometrics 48(1), 73–85. doi: 10.2307/2532740

Miller, R. and Siegmund, D. (1982). Maximally selected chi square statistics. Biometrics 38(4), 1011–1016. doi: 10.2307/2529881

Müller, J. and Hothorn, T. (2004). Maximally selected two-sample statistics as a new tool for the identification and assessment of habitat factors with an application to breeding bird communities in oak forests. European Journal of Forest Research 123(3), 219–228. doi: 10.1007/s10342-004-0035-5

### Examples

```
## Tree pipit data (Mueller and Hothorn, 2004)
## Asymptotic maximally selected statistics
maxstat_test(counts ~ coverstorey, data = treepipit)

## Asymptotic maximally selected statistics
## Note: all covariates simultaneously
mt <- maxstat_test(counts ~ ., data = treepipit)
mt@estimates\$estimate

## Malignant arrythmias data (Hothorn and Lausen, 2003, Sec. 7.2)
## Asymptotic maximally selected statistics
maxstat_test(Surv(time, event) ~  EF, data = hohnloser,
ytrafo = function(data)
trafo(data, surv_trafo = function(y)
logrank_trafo(y, ties.method = "Hothorn-Lausen")))

## Breast cancer data (Hothorn and Lausen, 2003, Sec. 7.3)
## Asymptotic maximally selected statistics
data("sphase", package = "TH.data")
maxstat_test(Surv(RFS, event) ~  SPF, data = sphase,
ytrafo = function(data)
trafo(data, surv_trafo = function(y)
logrank_trafo(y, ties.method = "Hothorn-Lausen")))

## Job satisfaction data (Agresti, 2002, p. 288, Tab. 7.8)
## Asymptotic maximally selected statistics
maxstat_test(jobsatisfaction)

## Asymptotic maximally selected statistics
## Note: 'Job.Satisfaction' and 'Income' as ordinal
maxstat_test(jobsatisfaction,
scores = list("Job.Satisfaction" = 1:4,
"Income" = 1:4))
```

[Package coin version 1.4-1 Index]