cdROC {nsROC} | R Documentation |
Cumulative/dynamic ROC curve estimate
Description
This function estimates a time-dependent ROC curve following the cumulative/dynamic approach and returns a 'cdroc' object. This object can be printed
or plotted
. To deal with the right censored problem different statistics can be considered: those ones proposed by Martinez-Camblor et al. (2016) based on the hazard Cox regression model (semiparametric) or the Kaplan-Meier estimator (non-parametric); and the one included in Li et al. (2016) based on the kernel-weighted Kaplan-Meier method. See References below.
Usage
cdROC(stime, status, marker, predict.time, ...)
## Default S3 method:
cdROC(stime, status, marker, predict.time, method=c('Cox', 'KM', 'wKM'),
kernel=c('normal', 'Epanechnikov', 'other'), h=1,
kernel.fun = function(x,xi,h){u <- (x-xi)/h; 1/(2*h)*(abs(u) <= 1)},
ci=FALSE, boot.n=100, conf.level=0.95, seed=2032, ...)
Arguments
stime |
vector of observed times. |
status |
vector of status (takes the value 0 if the subject is censored and 1 otherwise). |
marker |
vector of (bio)marker values. |
predict.time |
considered time point (scalar). |
method |
procedure used to estimate the probability. One of "Cox" (method based on Cox regression), "KM" (method based on Kaplan-Meier estimator) or "wKM" (method based on kernel-weighted Kaplan-Meier estimator). |
kernel |
procedure used to calculate the kernel function. One of "normal", "Epanechnikov" or "other". Only considered if |
h |
bandwith used to calculate the kernel function. Only considered if |
kernel.fun |
if |
ci |
if TRUE, a confidence interval for the area under the curve is computed. |
boot.n |
number of bootstrap replicates considered to build the confidence interval. Default: 100. |
conf.level |
the width of the confidence band as a number in (0,1). Default: 0.95, resulting in a 95% confidence band. |
seed |
seed considered to generate bootstrap replicates (for reproducibility). |
... |
additional arguments for |
Details
Assuming that larger values of the marker are associated with higher probabilities of occurrence of the event, the cumulative sensitivity and the dynamic specificity are defined by:
Se^C(x,t) = P(
marker
> x |
stime
\le t)
and Sp^D(x,t) = P(
marker
\le x |
stime
> t)
.
The resulting ROC curve is known as the cumulative/dynamic ROC curve, R_t^{C/D}
, where t =
predict.time
.
Data censored before t
is the major handicap with regard to the estimation of the time-dependent ROC curve. In order to estimate the probability of surviving beyond t
for the i
-th subject, \hat{P}_i
, three different methods are considered:
A semiparametric one, using a proportional hazard Cox regression model:
The hazard function is estimated by
\lambda(t) = \lambda_0(t) \cdot exp(\beta \cdot X)
whereX
denotes the marker.The probability is estimated by
\hat{P}_i = \frac{\hat{S}(t | X = x_i)}{\hat{S}(z_i | X = x_i)}
wherez_i
stands for the observed time of thei
-th subject and\hat{S}
is the survival function estimated from the Cox regression model.A non-parametric one, using the Kaplan-Meier estimator directly:
The probability is estimated by
\hat{P}_i = \frac{\hat{S}(t)}{\hat{S}(z_i)}
wherez_i
stands for the observed time of thei
-th subject and\hat{S}
is the survival function estimated by the Kaplan-Meier method referred to those subjects satisfyingX \le x_i
.A non-parametric one, using the kernel-weighted Kaplan-Meier estimator:
The survival function is estimated by
\hat{S}(t | X = x_i) = \prod_{s \leq t} \left[ 1- \frac{\sum_{j=1}^n K_h(x_j,x_i) I(z_j = s) status_j}{\sum_{j=1}^n K_h(x_j,x_i) I(z_j = s)} \right]
wherez_j
stands for the observed time of thej
-th subject,I
is the indicator function andstatus_j
takes the value 0 if thej
-th subject is censored and 1 otherwise.Two different methods can be considered in order to define the kernel function,
K_h(x_j,x_i)
:kernel='normal'
:K_h(x_j,x_i) = \frac{1}{h \sqrt{2 \pi}} exp\{ - \frac{(x_j - x_i)^2}{2 h^2} \}
kernel='Epanechnikov'
:K_h(x_j,x_i) = \frac{3}{4h} \left( 1 - \frac{x_j - x_i}{h} \right) I(|x_j - x_i| \le h)
where
h
is the bandwidth considered for kernel weights.If the user decide to use another kernel function,
kernel='other'
, it should be defined by thekernel.fun
input parameter, which has three parameters following this order:x
is a vector,xi
is the value around which the kernel weight should be computed andh
is the bandwidth.The probability is estimated by
\hat{P}_i = \frac{\hat{S}(t | X = x_i)}{\hat{S}(z_i | X = x_i)}
wherez_i
stands for the observed time of thei
-th subject and\hat{S}
is the survival function estimated by the kernel-weighted Kaplan-Meier method considered above.
Value
A list of class 'cdroc' with the following content:
TP |
vector of sensitivities (true positive rates). |
TN |
vector of specificities (true negative rates). |
cutPoints |
vector of thresholds considered for the (bio)marker. It coincides with the |
auc |
area under the curve estimate by trapezoidal rule. |
ci |
if TRUE, a confidence interval for the area under the curve has been computed. |
boot.n |
number of bootstrap replicates considered to build the confidence interval. Default: 100. |
conf.level |
the width of the confidence band as a number in (0,1). Default: 0.95, resulting in a 95% confidence band. |
seed |
seed considered to generate bootstrap replicates (for reproducibility). |
meanAuc |
bootstrap area under the curve estimate (mean along bootstrap replicates). |
ciAuc |
bootstrap confidence interval for the area under the curve. |
aucs |
vector of bootstrap area under the curve estimates. |
stime |
vector of observed times. |
status |
vector of status (takes the value 0 if the subject is censored and 1 otherwise). |
marker |
vector of (bio)marker values. |
predict.time |
considered time point (scalar). |
method |
procedure used in order to estimate the probability. |
kernel |
procedure used to calculate the kernel function. Only considered if |
h |
bandwith used to calculate the kernel function. Only considered if |
Note
survfit
and Surv
functions in survival
package are used in order to estimate the survival functions in both methodologies. Additionally, coxph
from the same package is used to fit the Cox proportional hazard regression model in the semiparametric approach.
References
Martinez-Camblor P., F-Bayon G., Perez-Fernandez S., 2016, Cumulative/dynamic ROC curve estimation, Journal of Statistical Computation and Simulation, 86(17), 3582-3594.
Li L., Greene T., Hu B., 2016, A simple method to estimate the time-dependent receiver operating characteristic curve and the area under the curve with right censored data, Statistical Methods in Medical Research, DOI: 10.1177/0962280216680239.
Examples
# Basic example. Data
set.seed(123)
stime <- rchisq(50,3)
status <- sample(c(rep(1,40), rep(0,10)))
marker <- max(stime) - stime + rnorm(50,0,2)
# Cumulative/dynamic ROC curve estimate at time 2.8 (Cox method is used) with 0.95 confidence
# interval for the area under the curve
cdROC(stime, status, marker, 2.8, ci=TRUE)
# Cumulative/dynamic ROC curve estimate at time 3.1 (Kaplan-Meier method is used)
cdROC(stime, status, marker, 3.1, method="KM")
# Cumulative/dynamic ROC curve estimate at time 3 (kernel-weighted Kaplan-Meier method with
# gaussian kernel and bandwidth 1 is used)
cdROC(stime, status, marker, 3, method="wKM")
# Cumulative/dynamic ROC curve estimate at time 3 (kernel-weighted Kaplan-Meier method with
# biweight kernel and bandwidth equals to 2 is used)
cdROC(stime, status, marker, 3, method="wKM", kernel="other", h=2,
kernel.fun = function(x,xi,h){u <- (x-xi)/h; 15/(16*h)*(1-u^2)^2*(abs(u)<=1)})