R: Comparison of k paired ROC curves

compareROCdep {nsROC}

R Documentation

Comparison of k paired ROC curves

Description

This function compares k ROC curves from dependent data. Different statistics can be considered in order to perform the comparison: those ones included in Martinez-Camblor et al. (2013) based on general distances between functions, the Venkatraman et al. (1996) methodology for comparing diagnostic the accuracy of the k markers based on data from a paired design and the DeLong et al. (1988) one based on the AUC (area under the curve) comparison. Two different methods could be considered to approximate the distribution function of the statistic: the procedure proposed by Venkatraman et al. (1996) (based on permutated samples) or the one introduced by Martinez-Camblor et al. (2012) (based on bootstrap samples). See References below.

Usage

compareROCdep(X, D, ...)
## Default S3 method:
compareROCdep(X, D, method=c("general.bootstrap","permutation","auc"),
              statistic=c("KS","L1","L2","CR","VK","other"),
              FUN.dist=function(g){max(abs(g))}, side=c("right","left"),
              Ni=1000, B=500, perm=500, seed=123, h.fun=function(H,x){
              H*sd(x)*length(x)^{-1/3}}, H=1, plot.roc=TRUE, type='s', lwd=3,
              lwd.curves=rep(2,ncol(X)), lty=1, lty.curves=rep(1,ncol(X)),
              col='black',col.curves=rainbow(ncol(X)), cex.lab=1.2,
              legend=c(sapply(1:ncol(X), function(i){eval(bquote(expression(
              hat(R)[.(i)](t))))}), expression(hat(R)(t))),
              legend.position='bottomright', legend.inset=0.03,
              cex.legend=1, ...)

Arguments

`X`	a matrix of k columns in which each column is the vector of (bio)marker values corresponding to each sample.
`D`	the vector of response values.
`method`	the method used to approximate the statistic distribution. One of "general.bootstrap" (Martinez-Camblor et al. (2012)), "permutation" (Venkatraman et al. (1996)) or "auc" (DeLong et al. (1988)).
`statistic`	the statistic used to compare the curves. One of "KS" (Kolmogorov-Smirnov criteria), "L1" (`L_1`-measure), "L2" (`L_2`-measure), "CR" (Cramer-von Mises), "other" (another statistic defined by the `FUN.dist` input parameter), "VK" (Venkatraman) or "AUC" (area under the curve).
`FUN.dist`	the distance considered as a function of one variable. If `statistic="other"` the statistic considered is `\sum_{i=1}^k` `FUN.dist`(`\sqrt{n_1}(\hat{R}_i(t) - \hat{R}(t)`) where `n_1` is the number of cases, `\hat{R}_i(t)` is the ROC curve estimate from the i-th sample and `\hat{R}(t) := k^{-1} \sum_{i=1}^k \hat{R}_i(t)`.
`side`	type of ROC curve. One of "right" or "left". If `method="VK"` only right-sided could be considered.
`Ni`	number of subintervals of the unit interval (FPR values) considered to calculate the curve. Default: 1000.
`B`	number of bootstrap samples if `method="general.bootstrap"`. Default: 500.
`perm`	number of permutations if `method="permutation"`. Default: 500.
`seed`	seed considered to generate the permutations (for reproducibility). Default: 123.
`h.fun`	a function defining the bandwidth calculus used to generate the bootstrap samples if `method="general.bootstrap"`. It has two arguments: the first one referred to the `H` value and the second one, `x`, referred to the sample. Default: `function(H,x){Hsd(x)length(x)^{-1/3}}`.
`H`	the value used to compute `h.fun`, that is, the bandwidth. Default: 1.
`plot.roc`	if TRUE, a plot including ROC curve estimates for the k samples and the mean of all of them is displayed.
`type`	what type of plot should be drawn.
`lwd`	the line width to be used for mean ROC curve estimate.
`lwd.curves`	a vector with the line widths to be used for ROC curve estimates of each sample.
`lty`	the line type to be used for mean ROC curve estimate.
`lty.curves`	a vector with the line types to be used for ROC curve estimates of each sample.
`col`	the color to be used for mean ROC curve estimate.
`col.curves`	a vector with the colors to be used for ROC curve estimates of each sample.
`cex.lab`	the magnification to be used for x and y labels relative to the current setting of `cex`.
`legend`	a character or expression vector to appear in the legend.
`legend.position`, `legend.inset`, `cex.legend`	the position of the legend, the inset distance from the margins as a fraction of the plot region when legend is placed and the character expansion factor relative to current `par("cex")`, respectively.
`...`	another graphical parameters to be passed.

Details

First of all, the data introduced is checked and those subjects with some missing information (marker or response value(s)) are removed. Data from a paired design should have the same length along the samples. If this is not fulfilled the code will not run and an error will be showed.

If the Venkatraman statistic is chosen in order to compare left-sided ROC curves, an error will be displayed and it will not work. The Venkatraman methodology is just implemented for right-sided ROC curves. Furthermore, for this statistics, method="permutation" is automatically assigned.

The statistic is defined by \sum_{i=1}^k FUN.dist(\sqrt{n_1} \cdot (\hat{R}_i(t) - \hat{R}(t))) where FUN.dist stands by the distance function, n_1 is the number of cases, \hat{R}_i(t) is the ROC curve estimate from the i-th sample and \hat{R}(t) := k^{-1} \sum_{i=1}^k \hat{R}_i(t).

The statistics implemented are defined by the following FUN.dist functions:

statistic="KS":

FUN.dist(g) = max(abs(g))
statistic="L1":

FUN.dist(g) = mean(abs(g))
statistic="L2":

FUN.dist(g) = mean(g^2)
statistic="CR":

FUN.dist.CR(g,h) = sum(g[-length(g)]^2*(h[-1]-h[-length(h)]))

Cramer von-Mises statistic is defined by \sum_{i=1}^k FUN.dist.CR(\sqrt{n_1} \cdot (\hat{R}_i(t) - \hat{R}(t)), \hat{R}(t))

In case of statistic="VK" the Venkatraman methodology (see References below) is computed to calculate the statistic. If k>2 the statistic value is the sum of statistic values of each pair such that i < j.

If method="general.bootstrap" it is necessary to have a bandwidth in order to compute the bootstrap samples from the smoothed (the gaussian kernel is considered) multivariate empirical distribution functions referred to controls and cases. This bandwidth is defined by the h.FUN function whose parameters are a bandwidth constant parameter defined by the user, H, and the sample (cases or controls values of the marker) considered, x.

If method="auc", the methodology proposed by DeLong et al. is implemented. This option is slower because of the Mann-Whitney statistic inside requires number~of~cases \cdot number~of~controls comparisons. In this case, statistic returns the value of the Mann-Whitney statistic estimate and test.statistic the final test statistic estimate (formula (5) in the paper) which follows a chi-square distribution.

Value

`n.controls`	the number of controls.
`n.cases`	the number of cases.
`controls.k`	a matrix whose columns are the controls along the k samples.
`cases.k`	a matrix whose columns are the cases along the k samples.
`statistic`	the value of the test statistic.
`stat.boot`	a vector of statistic values for bootstrap replicates if `method="general.bootstrap"`.
`stat.perm`	a vector of statistic values for permutations if `method="permutation"`.
`test.statistic`	statistic estimate given in formula (5) of DeLong et al. (1988) (See References below) if `method="auc"`.
`p.value`	the p-value for the test.

References

Venkatraman E.S., Begg C.B., 1996, A distribution-free procedure for comparing receiver operating characteristic curves from a paired experiment, Biometrika, 83(4), 835-848.

Martinez-Camblor P., Corral, N., 2012, A general bootstrap algorithm for hypothesis testing, Journal of Statistical Planning and Inference, 142, 589-600.

Martinez-Camblor P., Carleos C., Corral N., 2013, General nonparametric ROC curve comparison, Journal of the Korean Statistical Society, 42(1), 71-81.

DeLong E.R., DeLong D.M., Clarke-Pearson D.L., 1988, Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach, Biometrics, 44, 837-845.

Examples

n0 <- 45; n1 <- 60
set.seed(123)
D <- c(rep(0,n0), rep(1,n1))

library(mvtnorm)
rho.12 <- 1/4; rho.13 <- 1/4; rho.23 <- 0.5
sd.controls <- c(1,1,1)
sd.cases <- c(1,1,1)
var.controls <- sd.controls%*%t(sd.controls)
var.cases <- sd.cases%*%t(sd.cases)
sigma.controls <- var.controls*matrix(c(1,rho.12,rho.13,rho.12,1,rho.23,rho.13,rho.23,1),3,3)
sigma.cases <- var.cases*matrix(c(1,rho.12,rho.13,rho.12,1,rho.23,rho.13,rho.23,1),3,3)
controls <- rmvnorm(n0, mean=rep(0,3), sigma=sigma.controls)
cases <- rmvnorm(n1, mean=rep(1.19,3), sigma=sigma.cases)
marker.samples <- rbind(controls,cases)

# Default method: KS statistic proposed in Martinez-Camblor by general bootstrap
output <- compareROCdep(marker.samples, D)

# L1 statistic proposed in Martinez-Camblor by general bootstrap
output1 <- compareROCdep(marker.samples, D, statistic="L1")

# CR statistic proposed in Martinez-Camblor by permutation method
output2 <- compareROCdep(marker.samples, D, method="permutation", statistic="CR")

# Venkatraman statistic
output3 <- compareROCdep(marker.samples, D, statistic="VK")

# DeLong AUC comparison methodology
output4 <- compareROCdep(marker.samples, D, method="auc")

[Package nsROC version 1.1 Index]