MaximallySelectedStatisticsTests {coin} | R Documentation |
Generalized Maximally Selected Statistics
Description
Testing the independence of two sets of variables measured on arbitrary scales against cutpoint alternatives.
Usage
## S3 method for class 'formula'
maxstat_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'table'
maxstat_test(object, ...)
## S3 method for class 'IndependenceProblem'
maxstat_test(object, teststat = c("maximum", "quadratic"),
distribution = c("asymptotic", "approximate", "none"),
minprob = 0.1, maxprob = 1 - minprob, ...)
Arguments
formula |
a formula of the form |
data |
an optional data frame containing the variables in the model formula. |
subset |
an optional vector specifying a subset of observations to be used. Defaults
to |
weights |
an optional formula of the form |
object |
an object inheriting from classes |
teststat |
a character, the type of test statistic to be applied: either a maximum
statistic ( |
distribution |
a character, the conditional null distribution of the test statistic can be
approximated by its asymptotic distribution ( |
minprob |
a numeric, a fraction between 0 and 0.5 specifying that cutpoints only
greater than the |
maxprob |
a numeric, a fraction between 0.5 and 1 specifying that cutpoints only
smaller than the |
... |
further arguments to be passed to |
Details
maxstat_test()
provides generalized maximally selected statistics. The
family of maximally selected statistics encompasses a large collection of
procedures used for the estimation of simple cutpoint models including, but
not limited to, maximally selected \chi^2
statistics, maximally
selected Cochran-Armitage statistics, maximally selected rank statistics and
maximally selected statistics for multiple covariates. A general description
of these methods is given by Hothorn and Zeileis (2008).
The null hypothesis of independence, or conditional independence given
block
, between y1
, ..., yq
and x1
, ...,
xp
is tested against cutpoint alternatives. All possible partitions
into two groups are evaluated for each unordered covariate x1
, ...,
xp
, whereas only order-preserving binary partitions are evaluated for
ordered or numeric covariates. The cutpoint is then a set of levels defining
one of the two groups.
If both response and covariate is univariable, say y1
and x1
,
this procedure is known as maximally selected \chi^2
statistics
(Miller and Siegmund, 1982) when y1
is a binary factor and x1
is
a numeric variable, and as maximally selected rank statistics when y1
is a rank transformed numeric variable and x1
is a numeric variable
(Lausen and Schumacher, 1992). Lausen et al. (2004) introduced
maximally selected statistics for a univariable numeric response and multiple
numeric covariates x1
, ..., xp
.
If, say, y1
and/or x1
are ordered factors, the default scores,
1:nlevels(y1)
and 1:nlevels(x1)
, respectively, can be altered
using the scores
argument (see independence_test()
); this
argument can also be used to coerce nominal factors to class "ordered"
.
If both, say, y1
and x1
are ordered factors, a linear-by-linear
association test is computed and the direction of the alternative hypothesis
can be specified using the alternative
argument. The particular
extension to the case of a univariable ordered response and a univariable
numeric covariate was given by Betensky and Rabinowitz (1999) and
is known as maximally selected Cochran-Armitage statistics.
The conditional null distribution of the test statistic is used to obtain
p
-values and an asymptotic approximation of the exact distribution is
used by default (distribution = "asymptotic"
). Alternatively, the
distribution can be approximated via Monte Carlo resampling by setting
distribution
to "approximate"
. See asymptotic()
and approximate()
for details.
Value
An object inheriting from class "IndependenceTest"
.
Note
Starting with coin version 1.1-0, maximum statistics and quadratic forms
can no longer be specified using teststat = "maxtype"
and
teststat = "quadtype"
, respectively (as was used in versions prior to
0.4-5).
References
Betensky, R. A. and Rabinowitz, D. (1999). Maximally selected
\chi^2
statistics for k \times 2
tables.
Biometrics 55(1), 317–320.
doi:10.1111/j.0006-341X.1999.00317.x
Hothorn, T. and Lausen, B. (2003). On the exact distribution of maximally selected rank statistics. Computational Statistics & Data Analysis 43(2), 121–137. doi:10.1016/S0167-9473(02)00225-6
Hothorn, T. and Zeileis, A. (2008). Generalized maximally selected statistics. Biometrics 64(4), 1263–1269. doi:10.1111/j.1541-0420.2008.00995.x
Lausen, B., Hothorn, T., Bretz, F. and Schumacher, M. (2004). Assessment of optimal selected prognostic factors. Biometrical Journal 46(3), 364–374. doi:10.1002/bimj.200310030
Lausen, B. and Schumacher, M. (1992). Maximally selected rank statistics. Biometrics 48(1), 73–85. doi:10.2307/2532740
Miller, R. and Siegmund, D. (1982). Maximally selected chi square statistics. Biometrics 38(4), 1011–1016. doi:10.2307/2529881
Müller, J. and Hothorn, T. (2004). Maximally selected two-sample statistics as a new tool for the identification and assessment of habitat factors with an application to breeding bird communities in oak forests. European Journal of Forest Research 123(3), 219–228. doi:10.1007/s10342-004-0035-5
Examples
## Tree pipit data (Mueller and Hothorn, 2004)
## Asymptotic maximally selected statistics
maxstat_test(counts ~ coverstorey, data = treepipit)
## Asymptotic maximally selected statistics
## Note: all covariates simultaneously
mt <- maxstat_test(counts ~ ., data = treepipit)
mt@estimates$estimate
## Malignant arrythmias data (Hothorn and Lausen, 2003, Sec. 7.2)
## Asymptotic maximally selected statistics
maxstat_test(Surv(time, event) ~ EF, data = hohnloser,
ytrafo = function(data)
trafo(data, surv_trafo = function(y)
logrank_trafo(y, ties.method = "Hothorn-Lausen")))
## Breast cancer data (Hothorn and Lausen, 2003, Sec. 7.3)
## Asymptotic maximally selected statistics
data("sphase", package = "TH.data")
maxstat_test(Surv(RFS, event) ~ SPF, data = sphase,
ytrafo = function(data)
trafo(data, surv_trafo = function(y)
logrank_trafo(y, ties.method = "Hothorn-Lausen")))
## Job satisfaction data (Agresti, 2002, p. 288, Tab. 7.8)
## Asymptotic maximally selected statistics
maxstat_test(jobsatisfaction)
## Asymptotic maximally selected statistics
## Note: 'Job.Satisfaction' and 'Income' as ordinal
maxstat_test(jobsatisfaction,
scores = list("Job.Satisfaction" = 1:4,
"Income" = 1:4))