BuyseTest {BuyseTest}  R Documentation 
Twogroup GPC
Description
Performs Generalized Pairwise Comparisons (GPC) between two groups. Can handle one or several binary, continuous and timetoevent endpoints.
Usage
BuyseTest(
formula,
data,
scoring.rule = NULL,
pool.strata = NULL,
correction.uninf = NULL,
model.tte = NULL,
method.inference = NULL,
n.resampling = NULL,
strata.resampling = NULL,
hierarchical = NULL,
weightEndpoint = NULL,
weightObs = NULL,
neutral.as.uninf = NULL,
add.halfNeutral = NULL,
keep.pairScore = NULL,
seed = NULL,
cpus = NULL,
trace = NULL,
treatment = NULL,
endpoint = NULL,
type = NULL,
threshold = NULL,
status = NULL,
operator = NULL,
censoring = NULL,
restriction = NULL,
strata = NULL
)
Arguments
formula 
[formula] a symbolic description of the GPC model,
typically 
data 
[data.frame] dataset. 
scoring.rule 
[character] method used to compare the observations of a pair in presence of right censoring (i.e. 
pool.strata 
[character] weights used to combine estimates across strata. Can be

correction.uninf 
[integer] should a correction be applied to remove the bias due to the presence of uninformative pairs? 0 indicates no correction, 1 impute the average score of the informative pairs, and 2 performs IPCW. See Details, section "Handling missing values". 
model.tte 
[list] optional survival models relative to each time to each time to event endpoint.
Models must 
method.inference 
[character] method used to compute confidence intervals and pvalues.
Can be 
n.resampling 
[integer] the number of permutations/samples used for computing the confidence intervals and the p.values. See Details, section "Statistical inference". 
strata.resampling 
[character] the variable on which the permutation/sampling should be stratified. See Details, section "Statistical inference". 
hierarchical 
[logical] should only the uninformative pairs be analyzed at the lower priority endpoints (hierarchical GPC)? Otherwise all pairs will be compaired for all endpoint (full GPC). 
weightEndpoint 
[numeric vector] weights used to cumulating the pairwise scores over the endpoints.
Only used when 
weightObs 
[character or numeric vector] weights or variable in the dataset containing the weight associated to each observation. These weights are only considered when performing GPC (but not when fitting surival models). 
neutral.as.uninf 
[logical vector] should paired classified as neutral be reanalyzed using endpoints of lower priority (as it is done for uninformative pairs). See Details, section "Handling missing values". 
add.halfNeutral 
[logical] should half of the neutral score be added to the favorable and unfavorable scores? 
keep.pairScore 
[logical] should the result of each pairwise comparison be kept? 
seed 
[integer, >0] Random number generator (RNG) state used when starting resampling.
If 
cpus 
[integer, >0] the number of CPU to use. Only the permutation test can use parallel computation. See Details, section "Statistical inference". 
trace 
[integer] should the execution of the function be traced ? 
treatment , endpoint , type , threshold , status , operator , censoring , restriction , strata 
Alternative to 
Details
Specification of the GPC model
There are two way to specify the GPC model in BuyseTest
.
A Formula interface via the argument formula
where the response variable should be a binary variable defining the treatment arms.
The rest of the formula should indicate the endpoints by order of priority and the strata variables (if any).
A Vector interface using the following arguments

treatment
: [character] name of the treatment variable identifying the control and the experimental group. Must have only two levels (e.g.0
and1
). 
endpoint
: [character vector] the name of the endpoint variable(s). 
threshold
: [numeric vector] critical values used to compare the pairs (threshold of minimal important difference). A pair will be classified as neutral if the difference in endpoint is strictly below this threshold. There must be one threshold for each endpoint variable; it must beNA
for binary endpoints and positive for continuous or time to event endpoints. 
status
: [character vector] the name of the binary variable(s) indicating whether the endpoint was observed or censored. Must valueNA
when the endpoint is not a time to event. 
operator
: [character vector] the sign defining a favorable endpoint.">0"
indicates that higher values are favorable while "<0" indicates the opposite. 
type
: [character vector] indicates whether it is a binary outcome ("b"
,"bin"
, or"binary"
), a continuous outcome ("c"
,"cont"
, or"continuous"
), or a time to event outcome ("t"
,"tte"
,"time"
, or"timetoevent"
) 
censoring
: [character vector] is the endpoint subject to right or left censoring ("left"
or"right"
). The default is rightcensoring. 
restriction
: [numeric vector] value above which any difference is classified as neutral. 
strata
: [character vector] if notNULL
, the GPC will be applied within each group of patient defined by the strata variable(s).
The formula interface can be more concise, especially when considering few outcomes, but may be more difficult to apprehend for new users.
Note that arguments endpoint
, threshold
, status
, operator
, type
, and censoring
must have the same length.
GPC procedure
The GPC procedure form all pairs of observations, one belonging to the experimental group and the other to the control group, and class them in 4 categories:

Favorable pair: the endpoint is better for the observation in the experimental group.

Unfavorable pair: the endpoint is better for the observation in the control group.

Neutral pair: the difference between the endpoints of the two observations is (in absolute value) below the threshold. When
threshold=0
, neutral pairs correspond to pairs with equal endpoint. Lowerpriority outcomes (if any) are then used to classified the pair into favorable/unfavorable. 
Uninformative pair: censoring/missingness prevents from classifying into favorable, unfavorable or neutral.
With complete data, pairs can be decidely classified as favorable/unfavorable/neutral.
In presence of missing values, the GPC procedure uses the scoring rule (argument scoring.rule
) and the correction for uninformative pairs (argument correction.uninf
) to classify the pairs.
The classification may not be 0,1, e.g. the probability that the pair is favorable/unfavorable/neutral with the Peron's scoring rule.
To export the classification of each pair set the argument keep.pairScore
to TRUE
and call the function getPairScore
on the result of the BuyseTest
function.
Handling missing values

scoring.rule
: indicates how to handle rightcensoring in time to event endpoints using information from the survival curves. The Gehan's scoring rule (argumentscoring.rule="Gehan"
) only scores pairs that can be decidedly classified as favorable, unfavorable, or neutral while the "Peron"'s scoring rule (argumentscoring.rule="Peron"
) uses the empirical survival curves of each group to also score the pairs that cannot be decidedly classified. The Peron's scoring rule is the recommanded scoring rule but only handles rightcensoring. 
correction.uninf
: indicates how to handle missing values that could not be classified by the scoring rule.correction.uninf=0
treat them as uninformative: this is an equivalent to complete case analysis when
neutral.as.uninf=FALSE
, while whenneutral.as.uninf=TRUE
, uninformative pairs are treated as neutral, i.e., analyzed at the following endpoint (if any). This approach will (generally) lead to biased estimates for the proportion of favorable, unfavorable, or neutral pairs.correction.uninf=1
imputes to the uninformative pairs the average score of the informative pairs, i.e. assumes that uninformative pairs would on average behave like informative pairs. This is therefore the recommanded approach when this assumption is resonnable, typically when the the tail of the survival function estimated by the Kaplanâ€“Meier method is close to 0.
correction.uninf=2
uses inverse probability of censoring weights (IPCW), i.e. upweight informative pairs to represent uninformative pairs. It also assumes that uninformative pairs would on average behave like informative pairs and is only recommanded when the analysis is stopped after the first endpoint with uninformative pairs.
Note that both corrections will convert the whole proportion of uninformative pairs of a given endpoint into favorable, unfavorable, or neutral pairs. See Peron et al (2021) for further details and recommandations
Statistical inference
The argument method.inference
defines how to approximate the distribution of the GPC estimators and so how standard errors, confidence intervals, and pvalues are computed.
Available methods are:
argument
method.inference="none"
: only the point estimate is computed which makes the execution of theBuyseTest
faster than with the other methods.argument
method.inference="ustatistic"
: compute the variance of the estimate using a Hprojection of order 1 (default option) or 2 (seeBuyseTest.options
). The first order is downward biased but consistent. When considering the Gehan scoring rule, no transformation nor correction, the second order is unbiased and equivalent to the variance of the bootstrap distribution. Pvalues and confidence intervals are then evaluated assuming that the estimates follow a Gaussian distribution. WARNING: the current implementation of the Hprojection is not valid when using corrections for uninformative pairs (correction.uninf=1
, orcorrection.uninf=2
).argument
method.inference="permutation"
: perform a permutation test, estimating in each sample the summary statistics (net benefit, win ratio).argument
method.inference="studentized permutation"
: perform a permutation test, estimating in each sample the summary statistics (net benefit, win ratio) and the variancecovariance matrix of the estimate.argument
method.inference="varExact permutation"
: compute the variance of the permutation distribution using a closedform formula (Anderson and Verbeeck 2023). Pvalues and confidence intervals are then evaluated assuming that the estimates follow a Gaussian distribution. WARNING: the current implementation of the variance estimator for the permutation distribution is not valid when using the Peron scoring rule or corrections for uninformative pairs.argument
method.inference="bootstrap"
: perform a nonparametric boostrap, estimating in each sample the summary statistics (net benefit, win ratio).argument
method.inference="studentized bootstrap"
: perform a nonparametric boostrap, estimating in each sample the summary statistics (net benefit, win ratio) and the variancecovariance matrix of the estimator.
Additional arguments for permutation and bootstrap resampling:

strata.resampling
IfNA
or of length 0, the permutation/nonparametric boostrap will be performed by resampling in the whole sample. Otherwise, the permutation/nonparametric boostrap will be performed separately for each level that the variable defined instrata.resampling
take. 
n.resampling
set the number of permutations/samples used. A large number of permutations (e.g.n.resampling=10000
) are needed to obtain accurate CI and p.value. See (Buyse et al., 2010) for more details. 
seed
: the seed is used to generate one seed per sample. These seeds are the same whether one or several CPUs are used. 
cpus
indicates whether the resampling procedure can be splitted on several cpus to save time. Can be set to"all"
to use all available cpus. The detection of the number of cpus relies on thedetectCores
function from the parallel package.
Pooling results across strata
Consider K
strata and denote by m_k
and n_k
the sample size in the control and active arm (respectively) for strata k
. Let \sigma_k
be the standard error of the strataspecific summary statistic (e.g. net benefit). The strata specific weights, w_k
, are given by:

"CMH"
:w_k=\frac{\frac{m_k \times n_k}{m_k + n_k}}{\sum_{l=1}^K \frac{m_l \times n_l}{m_l + n_l}}
. Optimal if the if the odds ratios are constant across strata. 
"equal"
:w_k=\frac{1}{K}

"Buyse"
:w_k=\frac{m_k \times n_k}{\sum_{l=1}^K m_l \times n_l}
. Optimal if the risk difference is constant across strata 
"var*"
(e.g."varnetBenefit"
): .w_k=\frac{1/\sigma^2_k}{\sum_{l=1}^K 1/\sigma^2_k}
Default values
The default of the arguments
scoring.rule
, correction.uninf
, method.inference
, n.resampling
,
hierarchical
, neutral.as.uninf
, keep.pairScore
, strata.resampling
,
cpus
, trace
is read from BuyseTest.options()
.
Additional (hidden) arguments are

alternative
[character] the alternative hypothesis. Must be one of "two.sided", "greater" or "less" (used byconfint
). 
conf.level
[numeric] level for the confidence intervals (used byconfint
). 
keep.survival
[logical] export the survival values used by the Peron's scoring rule. 
order.Hprojection
[1 or 2] the order of the Hprojection used to compute the variance whenmethod.inference="ustatistic"
.
Value
An R object of class S4BuyseTest
.
Author(s)
Brice Ozenne
References
On the GPC procedure: Marc Buyse (2010). Generalized pairwise comparisons of prioritized endpoints in the twosample problem. Statistics in Medicine 29:32453257
On the win ratio: D. Wang, S. Pocock (2016). A win ratio approach to comparing continuous nonnormal outcomes in clinical trials. Pharmaceutical Statistics 15:238245
On the Peron's scoring rule: J. Peron, M. Buyse, B. Ozenne, L. Roche and P. Roy (2018). An extension of generalized pairwise comparisons for prioritized outcomes in the presence of censoring. Statistical Methods in Medical Research 27: 12301239.
On the Gehan's scoring rule: Gehan EA (1965). A generalized twosample Wilcoxon test for doubly censored data. Biometrika 52(3):650653
On inference in GPC using the Ustatistic theory: Ozenne B, BudtzJorgensen E, Peron J (2021). The asymptotic distribution of the Net Benefit estimator in presence of rightcensoring. Statistical Methods in Medical Research 2021 doi:10.1177/09622802211037067
On how to handle rightcensoring: J. Peron, M. Idlhaj, D. MaucortBoulch, et al. (2021) Correcting the bias of the net benefit estimator due to rightcensored observations. Biometrical Journal 63: 893â€“906.
On closedform formula for permutation variance: W.N. Anderson and J. Verbeeck (2023). Exact Permutation and Bootstrap Distribution of Generalized Pairwise Comparisons Statistics. Mathematics , 11, 1502. doi:10.3390/math11061502.
See Also
S4BuyseTestsummary
for a summary of the results of generalized pairwise comparison.
S4BuyseTestconfint
for exporting estimates with confidence intervals and pvalues.
S4BuyseTestmodel.tables
for exporting the number or percentage of favorable/unfavorable/neutral/uninformative pairs.
S4BuyseTestsensitivity
for performing a sensitivity analysis on the choice of the threshold(s).
S4BuyseTestplot
for graphical display of the pairs across endpoints.
S4BuyseTestgetIid
for exporting the first order Hdecomposition.
S4BuyseTestgetPairScore
for exporting the scoring of each pair.
Examples
library(data.table)
#### simulate some data ####
set.seed(10)
df.data < simBuyseTest(1e2, n.strata = 2)
## display
if(require(prodlim)){
resKM_tempo < prodlim(Hist(eventtime,status)~treatment, data = df.data)
plot(resKM_tempo)
}
#### one time to event endpoint ####
BT < BuyseTest(treatment ~ TTE(eventtime, status = status), data= df.data)
summary(BT) ## net benefit
model.tables(BT) ## export the table at the end of summary
summary(BT, percentage = FALSE)
summary(BT, statistic = "winRatio") ## win Ratio
## permutation instead of asymptotics to compute the pvalue
## Not run:
BTperm < BuyseTest(treatment ~ TTE(eventtime, status = status), data=df.data,
method.inference = "permutation", n.resampling = 1e3)
## End(Not run)
summary(BTperm)
summary(BTperm, statistic = "winRatio")
## same with parallel calculations
## Not run:
BTperm < BuyseTest(treatment ~ TTE(eventtime, status = status), data=df.data,
method.inference = "permutation", n.resampling = 1e3, cpus = 8)
summary(BTperm)
## End(Not run)
## method Gehan is much faster but does not optimally handle censored observations
BT < BuyseTest(treatment ~ TTE(eventtime, status = status), data=df.data,
scoring.rule = "Gehan", trace = 0)
summary(BT)
#### one time to event endpoint: only differences in survival over 1 unit ####
BT < BuyseTest(treatment ~ TTE(eventtime, threshold = 1, status = status), data=df.data)
summary(BT)
#### one time to event endpoint with a strata variable
BTS < BuyseTest(treatment ~ strata + TTE(eventtime, status = status), data=df.data)
summary(BTS)
#### several endpoints with a strata variable
ff < treatment ~ strata + T(eventtime, status, 1) + B(toxicity)
ff < update(ff,
~. + T(eventtime, status, 0.5) + C(score, 1) + T(eventtime, status, 0.25))
BTM < BuyseTest(ff, data=df.data)
summary(BTM)
plot(BTM)
#### real example : veteran dataset of the survival package ####
## Only one endpoint. Type = Timetoevent. Thresold = 0. Stratfication by histological subtype
## scoring.rule = "Gehan"
if(require(survival)){
## Not run:
data(cancer, package = "survival") ## import veteran
## scoring.rule = "Gehan"
BT_Gehan < BuyseTest(trt ~ celltype + TTE(time,threshold=0,status=status),
data=veteran, scoring.rule="Gehan")
summary_Gehan < summary(BT_Gehan)
summary_Gehan < summary(BT_Gehan, statistic = "winRatio")
## scoring.rule = "Peron"
BT_Peron < BuyseTest(trt ~ celltype + TTE(time,threshold=0,status=status),
data=veteran, scoring.rule="Peron")
summary(BT_Peron)
## End(Not run)
}