| free1way {stats} | R Documentation |
Distribution-free Inference in a Stratified One-Way Layout
Description
Estimation, tests, and confidence intervals for parameters in the distribution-free stratified K-sample one-way layout for binary, ordinal, numeric and potentially independently right-censored outcomes, including semiparametrically efficient score tests against Lehmann, odds ratio, or hazard ratio alternatives with corresponding confidence intervals.
Usage
free1way(y, ...)
## S3 method for class 'formula'
free1way(formula, data, weights, subset, na.action = na.pass,
event = NULL, ...)
## S3 method for class 'numeric'
free1way(y, groups, blocks = NULL, event = NULL, weights = NULL, nbins = 0,
varnames = NULL, ...)
## S3 method for class 'factor'
free1way(y, groups, blocks = NULL, event = NULL, weights = NULL,
varnames = NULL, ...)
## S3 method for class 'table'
free1way(y, link = c("logit", "probit", "cloglog", "loglog"),
mu = 0, B = 0, exact = FALSE, ...)
## S3 method for class 'free1way'
summary(object, test, alternative = c("two.sided", "less", "greater"),
tol = .Machine$double.eps, ...)
## S3 method for class 'free1way'
coef(object, what = c("shift", "PI", "AUC", "OVL"), ...)
## S3 method for class 'free1way'
confint(object, parm, level = .95,
test = c("Permutation", "Wald", "LRT", "Rao"),
what = c("shift", "PI", "AUC", "OVL"), ...)
## S3 method for class 'free1way'
vcov(object, ...)
## S3 method for class 'free1way'
logLik(object, ...)
Arguments
y |
a binary factor, an ordered factor, a numeric vector, or a
|
nbins |
an optional integer defining the number of intervals to
divide the range of a numeric response |
groups |
a grouping factor with at least two non-empty levels. |
blocks |
a stratification factor, optional. |
event |
a logical vector representing events ( |
formula |
a formula of the form |
data |
an optional data frame (or similar: see
|
weights |
an optional vector of weights to be used in the fitting
process. Should be |
subset |
an optional vector specifying a subset of observations to be used. |
na.action |
a function which indicates what should happen when
the data contain |
varnames |
a character vector giving the names of the response, grouping, and stratifying variables. |
link |
a character defining a link function and thus the model and parameter interpretation. See ‘Details’. |
mu |
a vector specifying optional parameters used to form the null hypothesis. See ‘Details’. |
B |
an integer specifying the number of replicates used in the
permutation test. The default ( |
exact |
a logical requesting the exact permutation distribution to be computed. Only available for unstratified two-sample proportional odds models. |
object |
an object of class |
test |
a character vector defining the global test procedure for all
parameters: |
alternative |
a character string specifying the alternative
hypothesis, must be one of |
what |
a character defining a monotone transformation of the
shift parameters: probabilistic indices ( |
tol |
a positive numeric tolerance. |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
the confidence level required. |
... |
arguments passed to the |
Details
The distribution function F_1 of the response in the control group
(defined by the first factor level of the grouping factor groups) is
compared to the distribution functions in the remaining groups F_k for
k = 2, \dots, K. No assumptions are made regarding the form of F_1,
all inference procedures are thus distribution-free. However, a
semiparametric model assumes that each distribution function F_k only
depends on F_1 and a scalar parameter \delta_k. The link
function Q defines the model
F_k(y) = Q^{-1}(Q(F_1(y)) - \delta_k)
such that positive values of the parameter \delta_k correspond to
stochastically larger response values in group k when compared to the
control group. Here, \mu_k defines the null hypothesis H_0^k: \delta_k
- \mu_k = 0, k = 2, \dots, K (or the one-sided equivalents thereof);
the parameter mu can be a vector of length K - 1 or a
scalar value to be recycled to this length.
The argument link is used to define the link function and thus
specific models: Log-odds ratio alternatives are based on the logit link
(link = "logit", with Q being the quantile function of the
standard logistic distribution)
\log\left(\frac{F_k(y)}{1 - F_k(y)}\right) = \log\left(\frac{F_1(y)}{1 - F_1(y)}\right) - \delta_k,
log-hazard ratio alternatives on the complementary log-log link (link =
"cloglog", with Q being the quantile function of the Gompertz
minimum extreme value distribution)
1 - F_k(y) = (1 - F_1(y))^{\exp(- \delta_k)},
Lehmann (or reverse
time log-hazard ratio) alternatives on
the log-log link (link = "loglog", with Q being the quantile
function of the Gumbel maximum extreme value distribution)
F_k(y) = F_1(y)^{\exp(- \delta_k)},
and a shift alternative on a
latent normal scale (similar in spirit to Cohen's d) on the probit link (link = "probit", with Q
being the quantile function of the standard normal distribution)
\Phi^{-1}(F_1(Y_k)) \sim N(\delta_k, 1)
for random variables Y_k \sim F_k.
If strata (independent blocks) are present, the distribution functions may be stratum-specific
but the shift parameters are assumed to be constant across strata, such that
the above model holds for each stratum.
The control distribution function F_1 is treated as a nuisance
parameter and the shift parameters \delta_2, \dots, \delta_K are
estimated by empirical maximum-likelihood estimation, the maximised
log-empirical likelihood is available via logLik. Parameter estimates and the
inverse observed Fisher information can be inspected via coef and
vcov.
Several global test procedures for the null hypothesis that all distribution
functions are identical (and thus \delta_2 = \dots = \delta_K = 0 when
mu = 0, otherwise the global null of all H_0^k holding
simultaneously is tested) can be specified by the test argument.
Parameter-specific confidence intervals are obtained via the inversion
of a specific test procedure.
Parameter interpretation might be easier on transformed scales, such as
odds-ratios or hazard ratios. In addition, the treatment effects in such
semiparametric models can be transformed into probabilistic indices
(what = "PI", being equivalent to the area under the curve
"AUC") or overlap coefficients (what = "OVL"), under the
assumption that the model is correct. These model-based estimators must not
be confused with assumption-free estimators, such as for example the
Wilcoxon-Mann-Whitney-U statistic for the AUC
(Fay and Malinovsky 2018). Confidence intervals for the
model-based measures are best obtained by inverting permutation score,
likelihood ratio, or Rao score tests (Sewak and Hothorn 2023),
exploiting their invariance with respect to monotone transformations.
Assuming one of the semiparametric models, the parameter estimates are semiparametrically efficient and the corresponding score tests and confidence intervals are locally most powerful (Chapter 15.5 in Van der Vaart 1998). An introduction to proportional-odds models in this class is available in Harrell Jr (2015).
free1way allows several classical tests to be performed and enhanced
with corresponding parameter estimates and a variety of inference
procedures under the randomisation and population model.
The two-sample Wilcoxon rank sum test (implemented in
wilcox.test) is obtained with the default link =
"logit" as is the K-sample Kruskal-Wallis rank sum test (implemented in
kruskal.test), where group differences can be quantified as
log-odds ratios. If each observation forms its own block for
K-samples, an analogue to friedman.test is obtained.
Unlike these classical implementations, free1way allows discrete
binary or ordered outcomes to be analysed in the same spirit, for example
for binary paired comparisons (as in mcnemar.test).
Value
An object of class free1way with corresponding logLik,
coef (which returns \delta_k - \mu_k, not \delta_k
whenever mu != 0, because \mu_k enters the model as an offset
term for the k-th group), vcov, summary, and confint methods.
References
Fay MP, Malinovsky Y (2018). “Confidence Intervals of the Mann-Whitney Parameter that Are Compatible with the Wilcoxon-Mann-Whitney Test.” Statistics in Medicine, 37(27), 3991–4006. doi:10.1002/sim.7890.
Harrell Jr FE (2015). “Ordinal Logistic Regression.” In Regression Modeling Strategies, 311–325. Springer. doi:10.1007/978-3-319-19425-7_13.
Sewak A, Hothorn T (2023). “Estimating Transformations for Evaluating Diagnostic Tests with Covariate Adjustment.” Statistical Methods in Medical Research, 32(7), 1403–1419. doi:10.1177/09622802231176030.
Van der Vaart AW (1998). Asymptotic Statistics. Cambridge University Press, Cambridge, UK. doi:10.1017/CBO9780511802256.
Examples
## Kruskal-Wallis test
kruskal.test(Ozone ~ Month, data = airquality)
kt <- free1way(Ozone ~ Month, data = airquality)
print(kt)
# log-odds ratios for comparison with control
coef(kt)
# Wald inference
summary(kt)
confint(kt, test = "Wald")
## Friedman test
example(friedman.test, echo = FALSE)
me <- colnames(RoundingTimes)
d <- expand.grid(me = factor(me, labels = me, levels = me),
id = factor(seq_len(nrow(RoundingTimes))))
d$time <- c(t(RoundingTimes))
# global p-value identical
friedman.test(RoundingTimes)
ft <- free1way(time ~ me | id, data = d)
print(ft)
coef(ft)
# Wald inference
summary(ft)
confint(ft, test = "Wald")
## McNemar test
## paired binary observations
example(mcnemar.test, echo = FALSE)
# set-up data frame with survey outcomes for voters
s <- gl(2, 1, labels = dimnames(Performance)[[1L]])
survey <- gl(2, 1, labels = c("1st", "2nd"))
nvoters <- c(Performance)
x <- expand.grid(survey = survey, voter = factor(seq_len(sum(nvoters))))
x$performance <- c(rep(s[c(1, 1)], nvoters[1]), rep(s[c(2, 1)], nvoters[2]),
rep(s[c(1, 2)], nvoters[3]), rep(s[c(2, 2)], nvoters[4]))
# note that only those voters changing their minds are relevant
mcn <- free1way(xtabs(~ performance + survey + voter, data = x))
# same result as mcnemar.test w/o continuity correction
print(mcn)
# X^2 statistic
summary(mcn, test = "Permutation")$statistic^2
mcnemar.test(Performance, correct = FALSE)
# Wald inference
summary(mcn)
confint(mcn, test = "Wald")
## Mantel-Haenszel test w/o continuity correction,
## Departments are blocks
mantelhaen.test(UCBAdmissions, correct = FALSE)
mh <- free1way(UCBAdmissions)
print(mh)
# common odds-ratio, with score interval
exp(coef(mh))
exp(confint(mh, test = "Rao"))
# looking at department-specific
# confidence intervals for log-odds ratios
# it seems Dept A is out of line
apply(UCBAdmissions, MARGIN = 3,
FUN = function(x) confint(free1way(as.table(x))))
## Mantel-Haenszel test treats variables as
## unordered, free1way allows ordered responses
example(mantelhaen.test, echo = FALSE)
# Does distribution of job satisfaction (ordered) depend on income
# in a stratified proportional odds model?
# Job Satisfaction is second in array but needs to be first
# for free1way to treat it as ordered response
ft <- free1way(aperm(Satisfaction, perm = c(2, 1, 3)))
summary(ft)