tdROC {tdROC} | R Documentation |
Estimate time-dependent prediction accuracy measures, including the ROC, AUC, Brier score, and survival difference, with right-censored survival data.
Description
This is a core function of the ‘tdROC‘ package. It uses the nonparametric weights proposed by Li (Li et al., 2015) to estimate a number of time-dependent prediction accuracy measures for right-censored survival outcomes, including ROC curve, AUC, Brier score, and survival difference. For each measure, the variance can be estimated through bootstrap resampling.
Usage
tdROC(
X,
Y,
delta,
tau,
span = 0.1,
h = NULL,
type = "uniform",
n.grid = 1000,
X.min = NULL,
X.max = NULL,
cut.off = NULL,
nboot = 0,
alpha = 0.05,
epsilon = NULL,
method = "both",
output = "both"
)
Arguments
X |
a numeric vector of risk score in the same length as |
Y |
a numeric vector of time to event in the same length as |
delta |
a vector of binary indicator of event (1) or censoring (0) in the same length as |
tau |
a scalar, the prediction horizon at which the prediction is evaluated. |
span |
a numeric value, the proportion of neighbour observations used in nearest neighbor method, default to 0.1. |
h |
a numeric value, the bandwidth of kernel weights, the defualt is |
type |
a character value, indicating the type of kernel function used to calculate kernel weights. The default is |
n.grid |
a positive integer, the number of grid points used when calculating the ROC curve. The default is |
X.min |
the lower boundary of grid cut-off points for biomarker |
X.max |
the upper boundary of grid cut-off points for biomarker |
cut.off |
a vector of |
nboot |
the number of bootstrap replications to be used for variance estimation. The default is |
alpha |
It is (1 - level of confidence interval)/2, default is |
epsilon |
The precision parameter used in an approximation to the weight calculation when the sample size is large. If a weight corresponding to a specific risk score is already calculated, then the weights corresponding to adjacent risk scores, within the distance specified by epsilon, will be the same under the approximation. This approximation avoids repeated calculation of weights that are almost the same, and hence increases the speed of computation in this situation. The default is NULL, which means no approximation is used. A higher value indicates less precision. |
method |
It is used to specify which method you would like to use to estimate AUC, default to |
output |
It is used to specify which kind of output you want, default to |
Details
This function takes the risk score value X
, the time-to-event data Y
and censoring indicator delta
as input to estimate
a number of time-dependent prediction accuracy measures for right-censored survival outcomes, including ROC curve, AUC, Brier score, and survival difference.
The confidence intervals of above quantities will be estimated by bootstrap.
This function offer two options to estimate AUC. The first one make use of estimated sensitivity and specificity to calculate the AUC via trapezoidal integration
by setting a series of cutoff point. The output will also include corresponding sensitivity and specificity for our plot function. The other one estimate AUC by the empirical estimator
of the proportion of concordance pairs with proposed weight estimator (Li et al, 2015). These two methods will generate quite similar estimates. The option can be set by argument method
.
We also include Brier Score and survival difference to evaluate the calibration metrics. Their definitions are included below.
They can be estimated with the proposed conditional probability weight (Wu and Li, 2018).
Both of them are measures to assess the accuracy of probabilistic predictions X
. The calibration result makes sense only
when the risk score X
is a predicted probability, and should be ignored otherwise.
\text{Brier Score} = E{[1(T \le \tau, \delta = 1) - X]^2}
\text{Survival difference} = E[1(T \le \tau, \delta = 1) - X]
As mentioned in arguments, we introduced a small precision parameter epsilon
to speed up the computation when the sample size is large.
For each subject with a risk score, X_i
, we assess whether there exists a previously processed grid point, X_{grid,m}
where 1\le m \le j
,
within the proximity of X_i
such that |X_i - X_{grid,m}| < \epsilon
. In the absence of such a point, we designate X_i
as a new grid point,
X_{grid,j+1}
, and store the corresponding survfit
object for subsequent weight estimation and mark it as a processed grid point. Conversely,
if a previously processed grid point is found, we directly utilize the stored survfit
object associated with it for weight calculation.
Given that the most time-consuming step in our estimation process is the survfit
computation, this method significantly reduces computing time
without incurring notable bias especially when dealing with large sample sizes.
Value
Returns a list of the following items:
main_res:
a list of AUC.integral
estimated by trapezoidal integration, AUC.empirical
estimated by empirical estimator of the proportion of concordance pairs.
and a data frame ROC
with dimension (2+n.grid) x 3
with columns cut.off
, sens
, and spec
.
calibration_res:
brier score and survival difference estimated based on the formula similar to Wu and Li (2018). When the risk score X
is a biomarker value instead of a predicted cumulative incidence probability, the brier score and survival difference cannot be calculated. In this case, please disregard the calibration results.
boot_res:
a list of bootstrap results, including bAUC
, bAUC2
, bBS
, bSurvDiff
, bROC
.
For bAUC
, bAUC2
, bBS
, bSurvDiff
, each one is a list including corresponding mean, standard deviation, and confidence interval.
bROC
is a data frame with colomns sens.mean
, sens.sd
, sens.lower
, sens.upper
, spec.mean
, spec.sd
, spec.lower
, spec.upper
Examples
library(survival)
data(mayo)
dat <- mayo[, c("time", "censor", "mayoscore5")]
fm <- tdROC(
X = dat$mayoscore5, Y = dat$time, delta = dat$censor,
tau = 365 * 6, span = 0.1, nboot = 0, alpha = 0.05,
n.grid = 1000, cut.off = 5:9
)
# In the following example, We use biomarker mayoscore5 to estimate predicted probability
# tipycally a monotone transformation function such as expit() is used to transform biomarker
# with range out of range into estimated probability between 0 and 1
expit <- function(x){ 1/(1+exp(-x)) }
tdROC(
X = expit(dat$mayoscore5), Y = dat$time, delta = dat$censor,
tau = 365 * 6, span = 0.1, nboot = 0, alpha = 0.05,
n.grid = 1000, cut.off = 5:9
)
tdROC(
X = expit(dat$mayoscore5), Y = dat$time, delta = dat$censor,
tau = 365 * 6, span = 0.1, nboot = 0, alpha = 0.05,
n.grid = 1000, cut.off = 5:9, epsilon = 0.05
)