R: Parametric and semiparametric frequentist inference of the...

cROC.sp {ROCnReg}

R Documentation

Parametric and semiparametric frequentist inference of the covariate-specific ROC curve (cROC).

Description

This function estimates the covariate-specific ROC curve (cROC) using the parametric approach proposed by Faraggi (2003) and the semiparametric approach proposed by Pepe (1998).

Usage

cROC.sp(formula.h, formula.d, group, tag.h, data, 
  newdata, est.cdf = c("normal", "empirical"),
  pauc = pauccontrol(), p = seq(0, 1, l = 101), B = 1000, ci.level = 0.95,
  parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)

Arguments

`formula.h`	A `formula` object specifying the location regression model to be fitted in the healthy population (see Details).
`formula.d`	A `formula` object specifying the location regression model to be fitted in the diseased population (see Details).
`group`	A character string with the name of the variable that distinguishes healthy from diseased individuals.
`tag.h`	The value codifying the healthy individuals in the variable `group`.
`data`	Data frame representing the data and containing all needed variables.
`newdata`	Optional data frame containing the values of the covariates at which the covariate-specific ROC curve (AUC and pAUC, if required) will be computed. If not supplied, the function `cROCData` is used to build a default dataset.
`est.cdf`	A character string. It indicates how the conditional distribution functions of the diagnostic test in healthy and diseased populations are estimated. Options are "normal" and "empirical" (see Details). The default is "normal".
`pauc`	A list of control values to replace the default values returned by the function `pauccontrol`. This argument is used to indicate whether the partial area under the covariate-adjusted ROC curve should be computed, and in case it is computed, whether the focus should be placed on restricted false positive fractions (FPFs) or on restricted true positive fractions (TPFs), and the upper bound for the FPF (if focus is FPF) or the lower bound for the TPF (if focus is TPF).
`p`	Set of false positive fractions (FPF) at which to estimate the covariate-specific ROC curve. This set is also used to compute the area under the covariate-specific ROC curve using Simpson's rule. Thus, the length of the set should be an odd number, and it should be rich enough for an accurate estimation.
`B`	An integer value specifying the number of bootstrap resamples for the construction of the confidence intervals. By default 1000.
`ci.level`	An integer value (between 0 and 1) specifying the confidence level. The default is 0.95.
`parallel`	A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow".
`ncpus`	An integer with the number of processes to be used in parallel operation. Defaults to 1.
`cl`	An object inheriting from class `cluster` (from the `parallel` package), specifying an optional parallel or snow cluster if parallel = "snow". If not supplied, a cluster on the local machine is created for the duration of the call.

Details

Estimates the covariate-specific ROC curve (cROC) defined as

ROC(p|\mathbf{x}) = 1 - F_{D}\{F_{\bar{D}}^{-1}(1-p|\mathbf{x})|\mathbf{x}\},

where

F_{D}(y|\mathbf{x}) = Pr(Y_{D} \leq y | \mathbf{X}_{D} = \mathbf{x}),

F_{\bar{D}}(y|\mathbf{x}) = Pr(Y_{\bar{D}} \leq y | \mathbf{X}_{\bar{D}} = \mathbf{x}).

Note that, for the sake of clarity, we assume that the covariates of interest are the same in both healthy and diseased populations. In particular, the method implemented in this function estimates F_{D}(\cdot|\mathbf{x}) and F_{\bar{D}}(\cdot|\mathbf{x}) assuming a (semiparametric) location regression model for Y in each population separately, i.e.,

Y_{D} = \mathbf{X}_{D}^{T}\mathbf{\beta}_{D} + \sigma_{D}\varepsilon_{D},

Y_{\bar{D}} = \mathbf{X}_{\bar{D}}^{T}\mathbf{\beta}_{\bar{D}} + \sigma_{\bar{D}}\varepsilon_{\bar{D}},

such that the covariate-specific ROC curve can be expressed as

ROC(p|\mathbf{x}) = 1 - G_{D}\{a(\mathbf{x}) + b G_{\bar{D}}^{-1}(1-p)\},

where a(\mathbf{x}) = \mathbf{x}^{T}\frac{\mathbf{\beta}_{\bar{D}} - \mathbf{\beta}_{D}}{\sigma_{D}}, b = \frac{\sigma_{\bar{D}}}{\sigma_{D}}, and G_{D} and G_{\bar{D}} are the distribution functions of \varepsilon_{D} and \varepsilon_{\bar{D}}, respectively. In line with the assumptions made about the distributions of \varepsilon_{D} and \varepsilon_{\bar{D}}, estimators will be referred to as: (a) "normal", where Gaussian errors are assumed, i.e., G_{D}(y) = G_{\bar{D}}(y) = \Phi(y) (Faraggi, 2003); and, (b) "empirical", where no assumptios are made about the distribution (in this case, G_{D} and G_{\bar{D}} are empirically estimated on the basis of standardised residuals (Pepe, 1998)).

The covariate-specific area under the curve is

AUC(\mathbf{x})=\int_{0}^{1}ROC(p|\mathbf{x})dp.

When Gaussian errors are assumed, there is a closed-form expression for the covariate-specific AUC, which is used in the package. In contrast, when no assumptios are made about the distributionis of the errors, the integral is computed numerically using Simpson's rule. With regard to the partial area under the curve, when focus = "FPF" and assuming an upper bound u_1 for the FPF, what it is computed is

pAUC_{FPF}(u_1|\mathbf{x})=\int_0^{u_1} ROC(p|\mathbf{x})dp.

Again, when Gaussian errors are assumed, there is a closed-form expression (Hillis and Metz, 2012). Otherwise, the integral is approximated numerically (Simpson's rule). The returned value is the normalised pAUC, pAUC_{FPF}(u_1|\mathbf{x})/u_1 so that it ranges from u_1/2 (useless test) to 1 (perfect marker). Conversely, when focus = "TPF", and assuming a lower bound for the TPF of u_2, the partial area corresponding to TPFs lying in the interval (u_2,1) is computed as

pAUC_{TPF}(u_2|\mathbf{x})=\int_{u_2}^{1}ROC_{TNF}(p|\mathbf{x})dp,

where ROC_{TNF}(p|\mathbf{x}) is a 270^\circ rotation of the ROC curve, and it can be expressed asROC_{TNF}(p|\mathbf{x}) = F_{\bar{D}}\{F_{D}^{-1}(1-p|\mathbf{x})|\mathbf{x}\}=G_{\bar{D}}\{\frac{\mu_{D}(\mathbf{x})-\mu_{\bar{D}}(\mathbf{x})}{\sigma_{\bar{D}}(\mathbf{x})}+G_{D}^{-1}(1-p)\frac{\sigma_{D}(\mathbf{x})}{\sigma_{\bar{D}}(\mathbf{x})}\}. Again, when Gaussian errors are assumed, there is a closed-form expression (Hillis and Metz, 2012). Otherwise, the integral is approximated numerically (Simpson's rule). The returned value is the normalised pAUC, pAUC_{TPF}(u_2|\mathbf{x})/(1-u_2), so that it ranges from (1-u_2)/2 (useless test) to 1 (perfect test).

Value

As a result, the function provides a list with the following components:

`call`	The matched call.
`newdata`	A data frame containing the values of the covariates at which the covariate-specific ROC curve (AUC and pAUC, if required) was computed.
`data`	The original supplied data argument.
`missing.ind`	A logical value indicating whether for each pair of observations (test outcomes and covariates) missing values occur.
`marker`	The name of the diagnostic test variable in the dataframe.
`group`	The value of the argument `group` used in the call.
`tag.h`	The value of the argument `tag.h` used in the call.
`formula`	Named list of length two with the value of the arguments `formula.h` and `formula.d` used in the call.
`est.cdf`	The value of the argument `est.cdf` used in the call.
`p`	Set of false positive fractions (FPF) at which the covariate-specific ROC curves have been estimated.
`ci.level`	The value of the argument `ci.level` used in the call.
`ROC`	Estimated covariate-specific ROC curve, and `ci.level`*100% pointwise confidence intervals (if computed).
`AUC`	Estimated area under the covariate-specific ROC curve, and `ci.level`*100% confidence interval (if computed).
`pAUC`	If computed, estimated partial area under the covariate-adjusted ROC curve and `ci.level`*100% confidence interval (if computed). Note that the returned values are normalised, so that the maximum value is one.
`fit`	Named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component contains an object of class `lm` with the fitted regression model.
`coeff`	Estimated regression coefficients (and `ci.level`*100% confidence interval if `B` greater than zero) from the fit of the linear model in the healthy and diseased population, as specified in `formula.h` and `formula.d`, respectively.

References

Faraggi, D. (2003). Adjusting receiver operating characteristic curves and related indices for covariates. The Statistician 52, 179–192.

Hillis, S. L. and Metz, C.E. (2012). An Analytic Expression for the Binormal Partial Area under the ROC Curve. Academic Radiology, 19, 1491–1498.

Pepe, M.S. (1998). Three approaches to regression analysis of receiver operating characteristic curves for continuous test results. Biometrics 54, 124–135.

Examples

library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]

# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)

# Covariate for prediction
agep <- seq(min(newpsa$age), max(newpsa$age), length = 50)
df.pred <- data.frame(age = agep)


cROC_sp_normal <- cROC.sp(formula.h = l_marker1 ~ age,
                          formula.d = l_marker1 ~ age,
                          group = "status", 
                          tag.h = 0,
                          data = newpsa,
                          newdata = df.pred,
                          est.cdf = "normal",
                          pauc = list(compute = TRUE, value = 0.5, focus = "FPF"),
                          p = seq(0, 1, l = 101), 
                          B = 500)
summary(cROC_sp_normal)

plot(cROC_sp_normal)

[Package ROCnReg version 1.0-9 Index]