R: Distribution-free Inference in a Stratified One-Way Layout

free1way {stats}

R Documentation

Distribution-free Inference in a Stratified One-Way Layout

Description

Estimation, tests, and confidence intervals for parameters in the distribution-free stratified K-sample one-way layout for binary, ordinal, numeric and potentially independently right-censored outcomes, including semiparametrically efficient score tests against Lehmann, odds ratio, or hazard ratio alternatives with corresponding confidence intervals.

Usage

free1way(y, ...)
## S3 method for class 'formula'
free1way(formula, data, weights, subset, na.action = na.pass,
         event = NULL, ...)
## S3 method for class 'numeric'
free1way(y, groups, blocks = NULL, event = NULL, weights = NULL, nbins = 0, 
         varnames = NULL, ...)
## S3 method for class 'factor'
free1way(y, groups, blocks = NULL, event = NULL, weights = NULL, 
         varnames = NULL, ...)
## S3 method for class 'table'
free1way(y, link = c("logit", "probit", "cloglog", "loglog"), 
         mu = 0, B = 0, exact = FALSE, ...)
## S3 method for class 'free1way'
summary(object, test, alternative = c("two.sided", "less", "greater"), 
        tol = .Machine$double.eps, ...)
## S3 method for class 'free1way'
coef(object, what = c("shift", "PI", "AUC", "OVL"), ...)
## S3 method for class 'free1way'
confint(object, parm, level = .95, 
        test = c("Permutation", "Wald", "LRT", "Rao"), 
        what = c("shift", "PI", "AUC", "OVL"), ...)
## S3 method for class 'free1way'
vcov(object, ...)
## S3 method for class 'free1way'
logLik(object, ...)

Arguments

y

a binary factor, an ordered factor, a numeric vector, or a Surv object (right-censoring only) containing the response values or a table with the response categories in the first dimension, the groups in the second dimension, and, optionally, blocks and event indicators as third and fourth dimensions.

nbins

an optional integer defining the number of intervals to divide the range of a numeric response y into. The default is to cut the observations at breaks given by the uniquely observed values (nbins = 0). In the presence of right-censoring, uniquely observed event times define the breaks. For large sample sizes with many unique observations, limiting the number of bins to less than 100, say, dramatically reduces computation time while producing almost the same results as the default. The latter option is only available in the absence of right-censoring.

groups

a grouping factor with at least two non-empty levels.

blocks

a stratification factor, optional.

event

a logical vector representing events (TRUE) and independently right-censored observations (FALSE), optional. Right-censoring can be specified by either using Surv as a response in a formula or by a logical event argument (where FALSE indicates a right-censored observation).

formula

a formula of the form y ~ groups | blocks where y gives the sample outcome values (binary, ordered, numeric, or Surv) and groups the corresponding groups. In stratified designs, a blocks term specifies the strata.

data

an optional data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken from environment(formula).

weights

an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If non-NULL, the weighted log-likelihood is maximised.

subset

an optional vector specifying a subset of observations to be used.

na.action

a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").

varnames

a character vector giving the names of the response, grouping, and stratifying variables.

link

a character defining a link function and thus the model and parameter interpretation. See ‘Details’.

mu

a vector specifying optional parameters used to form the null hypothesis. See ‘Details’.

B

an integer specifying the number of replicates used in the permutation test. The default (B = 0) performs an asymptotic permutation test, B > 0 uses B Monte Carlo replications to approximate the permutation distribution.

exact

a logical requesting the exact permutation distribution to be computed. Only available for unstratified two-sample proportional odds models.

object

an object of class free1way as returned by free1way.

test

a character vector defining the global test procedure for all parameters: "Permutation" performs a conditional permutation score test under the randomisation model, "Wald" performs a Wald test, "LRT" a likelihood ratio, and "Rao" a Rao score test under the population model.

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less".

what

a character defining a monotone transformation of the shift parameters: probabilistic indices ("PI"), area under the curves ("AUC"), or overlap coefficients ("OVL"). The default is to return parameters on the original shift scale.

tol

a positive numeric tolerance.

parm

a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.

level

the confidence level required.

...

arguments passed to the table method for free1way.

Details

The distribution function F_1 of the response in the control group (defined by the first factor level of the grouping factor groups) is compared to the distribution functions in the remaining groups F_k for k = 2, \dots, K. No assumptions are made regarding the form of F_1, all inference procedures are thus distribution-free. However, a semiparametric model assumes that each distribution function F_k only depends on F_1 and a scalar parameter \delta_k. The link function Q defines the model

F_k(y) = Q^{-1}(Q(F_1(y)) - \delta_k)

such that positive values of the parameter \delta_k correspond to stochastically larger response values in group k when compared to the control group. Here, \mu_k defines the null hypothesis H_0^k: \delta_k - \mu_k = 0, k = 2, \dots, K (or the one-sided equivalents thereof); the parameter mu can be a vector of length K - 1 or a scalar value to be recycled to this length.

The argument link is used to define the link function and thus specific models: Log-odds ratio alternatives are based on the logit link (link = "logit", with Q being the quantile function of the standard logistic distribution)

\log\left(\frac{F_k(y)}{1 - F_k(y)}\right) = \log\left(\frac{F_1(y)}{1 - F_1(y)}\right) - \delta_k,

log-hazard ratio alternatives on the complementary log-log link (link = "cloglog", with Q being the quantile function of the Gompertz minimum extreme value distribution)

1 - F_k(y) = (1 - F_1(y))^{\exp(- \delta_k)},

Lehmann (or reverse time log-hazard ratio) alternatives on the log-log link (link = "loglog", with Q being the quantile function of the Gumbel maximum extreme value distribution)

F_k(y) = F_1(y)^{\exp(- \delta_k)},

and a shift alternative on a latent normal scale (similar in spirit to Cohen's d) on the probit link (link = "probit", with Q being the quantile function of the standard normal distribution)

\Phi^{-1}(F_1(Y_k)) \sim N(\delta_k, 1)

for random variables Y_k \sim F_k. If strata (independent blocks) are present, the distribution functions may be stratum-specific but the shift parameters are assumed to be constant across strata, such that the above model holds for each stratum.

The control distribution function F_1 is treated as a nuisance parameter and the shift parameters \delta_2, \dots, \delta_K are estimated by empirical maximum-likelihood estimation, the maximised log-empirical likelihood is available via logLik. Parameter estimates and the inverse observed Fisher information can be inspected via coef and vcov.

Several global test procedures for the null hypothesis that all distribution functions are identical (and thus \delta_2 = \dots = \delta_K = 0 when mu = 0, otherwise the global null of all H_0^k holding simultaneously is tested) can be specified by the test argument. Parameter-specific confidence intervals are obtained via the inversion of a specific test procedure.

Parameter interpretation might be easier on transformed scales, such as odds-ratios or hazard ratios. In addition, the treatment effects in such semiparametric models can be transformed into probabilistic indices (what = "PI", being equivalent to the area under the curve "AUC") or overlap coefficients (what = "OVL"), under the assumption that the model is correct. These model-based estimators must not be confused with assumption-free estimators, such as for example the Wilcoxon-Mann-Whitney-U statistic for the AUC (Fay and Malinovsky 2018). Confidence intervals for the model-based measures are best obtained by inverting permutation score, likelihood ratio, or Rao score tests (Sewak and Hothorn 2023), exploiting their invariance with respect to monotone transformations.

Assuming one of the semiparametric models, the parameter estimates are semiparametrically efficient and the corresponding score tests and confidence intervals are locally most powerful (Chapter 15.5 in Van der Vaart 1998). An introduction to proportional-odds models in this class is available in Harrell Jr (2015).

free1way allows several classical tests to be performed and enhanced with corresponding parameter estimates and a variety of inference procedures under the randomisation and population model. The two-sample Wilcoxon rank sum test (implemented in wilcox.test) is obtained with the default link = "logit" as is the K-sample Kruskal-Wallis rank sum test (implemented in kruskal.test), where group differences can be quantified as log-odds ratios. If each observation forms its own block for K-samples, an analogue to friedman.test is obtained. Unlike these classical implementations, free1way allows discrete binary or ordered outcomes to be analysed in the same spirit, for example for binary paired comparisons (as in mcnemar.test).

Value

An object of class free1way with corresponding logLik, coef (which returns \delta_k - \mu_k, not \delta_k whenever mu != 0, because \mu_k enters the model as an offset term for the k-th group), vcov, summary, and confint methods.

References

Fay MP, Malinovsky Y (2018). “Confidence Intervals of the Mann-Whitney Parameter that Are Compatible with the Wilcoxon-Mann-Whitney Test.” Statistics in Medicine, 37(27), 3991–4006. doi:10.1002/sim.7890.

Harrell Jr FE (2015). “Ordinal Logistic Regression.” In Regression Modeling Strategies, 311–325. Springer. doi:10.1007/978-3-319-19425-7_13.

Sewak A, Hothorn T (2023). “Estimating Transformations for Evaluating Diagnostic Tests with Covariate Adjustment.” Statistical Methods in Medical Research, 32(7), 1403–1419. doi:10.1177/09622802231176030.

Van der Vaart AW (1998). Asymptotic Statistics. Cambridge University Press, Cambridge, UK. doi:10.1017/CBO9780511802256.

Examples



## Kruskal-Wallis test
kruskal.test(Ozone ~ Month, data = airquality)
kt <- free1way(Ozone ~ Month, data = airquality)
print(kt)
# log-odds ratios for comparison with control
coef(kt)
# Wald inference
summary(kt)
confint(kt, test = "Wald")

## Friedman test
example(friedman.test, echo = FALSE)
me <- colnames(RoundingTimes)
d <- expand.grid(me = factor(me, labels = me, levels = me),
                 id = factor(seq_len(nrow(RoundingTimes))))
d$time <- c(t(RoundingTimes))
# global p-value identical
friedman.test(RoundingTimes)
ft <- free1way(time ~ me | id, data = d)
print(ft)
coef(ft)
# Wald inference
summary(ft)
confint(ft, test = "Wald")

## McNemar test
## paired binary observations
example(mcnemar.test, echo = FALSE)
# set-up data frame with survey outcomes for voters
s <- gl(2, 1, labels = dimnames(Performance)[[1L]])
survey <- gl(2, 1, labels = c("1st", "2nd"))
nvoters <- c(Performance)
x <- expand.grid(survey = survey, voter = factor(seq_len(sum(nvoters))))
x$performance <- c(rep(s[c(1, 1)], nvoters[1]), rep(s[c(2, 1)], nvoters[2]),
                   rep(s[c(1, 2)], nvoters[3]), rep(s[c(2, 2)], nvoters[4]))
# note that only those voters changing their minds are relevant
mcn <- free1way(xtabs(~ performance + survey + voter, data = x))
# same result as mcnemar.test w/o continuity correction
print(mcn)
# X^2 statistic
summary(mcn, test = "Permutation")$statistic^2
mcnemar.test(Performance, correct = FALSE)
# Wald inference
summary(mcn)
confint(mcn, test = "Wald")

## Mantel-Haenszel test w/o continuity correction, 
## Departments are blocks
mantelhaen.test(UCBAdmissions, correct = FALSE)
mh <- free1way(UCBAdmissions)
print(mh)
# common odds-ratio, with score interval
exp(coef(mh))
exp(confint(mh, test = "Rao"))
# looking at department-specific 
# confidence intervals for log-odds ratios 
# it seems Dept A is out of line
apply(UCBAdmissions, MARGIN = 3,  
      FUN = function(x) confint(free1way(as.table(x))))

## Mantel-Haenszel test treats variables as
## unordered, free1way allows ordered responses
example(mantelhaen.test, echo = FALSE)
# Does distribution of job satisfaction (ordered) depend on income
# in a stratified proportional odds model?
# Job Satisfaction is second in array but needs to be first
# for free1way to treat it as ordered response
ft <- free1way(aperm(Satisfaction, perm = c(2, 1, 3)))
summary(ft)

[Package stats version 4.6.1 Index]