R: Cumulative/dynamic ROC curve estimate

cdROC {nsROC}

R Documentation

Cumulative/dynamic ROC curve estimate

Description

This function estimates a time-dependent ROC curve following the cumulative/dynamic approach and returns a 'cdroc' object. This object can be printed or plotted. To deal with the right censored problem different statistics can be considered: those ones proposed by Martinez-Camblor et al. (2016) based on the hazard Cox regression model (semiparametric) or the Kaplan-Meier estimator (non-parametric); and the one included in Li et al. (2016) based on the kernel-weighted Kaplan-Meier method. See References below.

Usage

cdROC(stime, status, marker, predict.time, ...)
## Default S3 method:
cdROC(stime, status, marker, predict.time, method=c('Cox', 'KM', 'wKM'),
      kernel=c('normal', 'Epanechnikov', 'other'), h=1,
      kernel.fun = function(x,xi,h){u <- (x-xi)/h; 1/(2*h)*(abs(u) <= 1)},
      ci=FALSE, boot.n=100, conf.level=0.95, seed=2032, ...)

Arguments

`stime`	vector of observed times.
`status`	vector of status (takes the value 0 if the subject is censored and 1 otherwise).
`marker`	vector of (bio)marker values.
`predict.time`	considered time point (scalar).
`method`	procedure used to estimate the probability. One of "Cox" (method based on Cox regression), "KM" (method based on Kaplan-Meier estimator) or "wKM" (method based on kernel-weighted Kaplan-Meier estimator).
`kernel`	procedure used to calculate the kernel function. One of "normal", "Epanechnikov" or "other". Only considered if `method='wKM'`.
`h`	bandwith used to calculate the kernel function. Only considered if `method='wKM'`.
`kernel.fun`	if `method='wKM'` and `kernel='other'`, function used to calculate the kernel function. It has three input parameters: `x`=vector, `xi`=value around which the kernel weight should be computed, `h`=bandwidth. Default: Uniform kernel.
`ci`	if TRUE, a confidence interval for the area under the curve is computed.
`boot.n`	number of bootstrap replicates considered to build the confidence interval. Default: 100.
`conf.level`	the width of the confidence band as a number in (0,1). Default: 0.95, resulting in a 95% confidence band.
`seed`	seed considered to generate bootstrap replicates (for reproducibility).
`...`	additional arguments for `cdROC`. Ignored.

Details

Assuming that larger values of the marker are associated with higher probabilities of occurrence of the event, the cumulative sensitivity and the dynamic specificity are defined by:

Se^C(x,t) = P(marker > x | stime \le t) and Sp^D(x,t) = P(marker \le x | stime > t).

The resulting ROC curve is known as the cumulative/dynamic ROC curve, R_t^{C/D}, where t = predict.time.

Data censored before t is the major handicap with regard to the estimation of the time-dependent ROC curve. In order to estimate the probability of surviving beyond t for the i-th subject, \hat{P}_i, three different methods are considered:

A semiparametric one, using a proportional hazard Cox regression model:

The hazard function is estimated by \lambda(t) = \lambda_0(t) \cdot exp(\beta \cdot X) where X denotes the marker.

The probability is estimated by \hat{P}_i = \frac{\hat{S}(t | X = x_i)}{\hat{S}(z_i | X = x_i)} where z_i stands for the observed time of the i-th subject and \hat{S} is the survival function estimated from the Cox regression model.
A non-parametric one, using the Kaplan-Meier estimator directly:

The probability is estimated by \hat{P}_i = \frac{\hat{S}(t)}{\hat{S}(z_i)} where z_i stands for the observed time of the i-th subject and \hat{S} is the survival function estimated by the Kaplan-Meier method referred to those subjects satisfying X \le x_i.
A non-parametric one, using the kernel-weighted Kaplan-Meier estimator:

The survival function is estimated by \hat{S}(t | X = x_i) = \prod_{s \leq t} \left[ 1- \frac{\sum_{j=1}^n K_h(x_j,x_i) I(z_j = s) status_j}{\sum_{j=1}^n K_h(x_j,x_i) I(z_j = s)} \right] where z_j stands for the observed time of the j-th subject, I is the indicator function and status_j takes the value 0 if the j-th subject is censored and 1 otherwise.

Two different methods can be considered in order to define the kernel function, K_h(x_j,x_i):
- kernel='normal':
  
  K_h(x_j,x_i) = \frac{1}{h \sqrt{2 \pi}} exp\{ - \frac{(x_j - x_i)^2}{2 h^2} \}
- kernel='Epanechnikov':
  
  K_h(x_j,x_i) = \frac{3}{4h} \left( 1 - \frac{x_j - x_i}{h} \right) I(|x_j - x_i| \le h)
where h is the bandwidth considered for kernel weights.

If the user decide to use another kernel function, kernel='other', it should be defined by the kernel.fun input parameter, which has three parameters following this order: x is a vector, xi is the value around which the kernel weight should be computed and h is the bandwidth.

The probability is estimated by \hat{P}_i = \frac{\hat{S}(t | X = x_i)}{\hat{S}(z_i | X = x_i)} where z_i stands for the observed time of the i-th subject and \hat{S} is the survival function estimated by the kernel-weighted Kaplan-Meier method considered above.

Value

A list of class 'cdroc' with the following content:

`TP`	vector of sensitivities (true positive rates).
`TN`	vector of specificities (true negative rates).
`cutPoints`	vector of thresholds considered for the (bio)marker. It coincides with the `marker` vector adding `min(marker)-1` and `max(marker)+1`.
`auc`	area under the curve estimate by trapezoidal rule.
`ci`	if TRUE, a confidence interval for the area under the curve has been computed.
`boot.n`	number of bootstrap replicates considered to build the confidence interval. Default: 100.
`conf.level`	the width of the confidence band as a number in (0,1). Default: 0.95, resulting in a 95% confidence band.
`seed`	seed considered to generate bootstrap replicates (for reproducibility).
`meanAuc`	bootstrap area under the curve estimate (mean along bootstrap replicates).
`ciAuc`	bootstrap confidence interval for the area under the curve.
`aucs`	vector of bootstrap area under the curve estimates.
`stime`	vector of observed times.
`status`	vector of status (takes the value 0 if the subject is censored and 1 otherwise).
`marker`	vector of (bio)marker values.
`predict.time`	considered time point (scalar).
`method`	procedure used in order to estimate the probability.
`kernel`	procedure used to calculate the kernel function. Only considered if `method='wKM'`.
`h`	bandwith used to calculate the kernel function. Only considered if `method='wKM'`.

Note

survfit and Surv functions in survival package are used in order to estimate the survival functions in both methodologies. Additionally, coxph from the same package is used to fit the Cox proportional hazard regression model in the semiparametric approach.

References

Martinez-Camblor P., F-Bayon G., Perez-Fernandez S., 2016, Cumulative/dynamic ROC curve estimation, Journal of Statistical Computation and Simulation, 86(17), 3582-3594.

Li L., Greene T., Hu B., 2016, A simple method to estimate the time-dependent receiver operating characteristic curve and the area under the curve with right censored data, Statistical Methods in Medical Research, DOI: 10.1177/0962280216680239.

Examples

# Basic example. Data
set.seed(123)
stime <- rchisq(50,3)
status <- sample(c(rep(1,40), rep(0,10)))
marker <- max(stime) - stime + rnorm(50,0,2)

# Cumulative/dynamic ROC curve estimate at time 2.8 (Cox method is used) with 0.95 confidence
# interval for the area under the curve
cdROC(stime, status, marker, 2.8, ci=TRUE)

# Cumulative/dynamic ROC curve estimate at time 3.1 (Kaplan-Meier method is used)
cdROC(stime, status, marker, 3.1, method="KM")

# Cumulative/dynamic ROC curve estimate at time 3 (kernel-weighted Kaplan-Meier method with
# gaussian kernel and bandwidth 1 is used)
cdROC(stime, status, marker, 3, method="wKM")

# Cumulative/dynamic ROC curve estimate at time 3 (kernel-weighted Kaplan-Meier method with
# biweight kernel and bandwidth equals to 2 is used)
cdROC(stime, status, marker, 3, method="wKM", kernel="other", h=2,
      kernel.fun = function(x,xi,h){u <- (x-xi)/h; 15/(16*h)*(1-u^2)^2*(abs(u)<=1)})

[Package nsROC version 1.1 Index]