pooledROC.dpm {ROCnReg} | R Documentation |
Nonparametric Bayesian inference of the pooled ROC curve
Description
This function estimates the pooled ROC curve using a Dirichlet process mixture of normals model as proposed by Erkanli et al. (2006).
Usage
pooledROC.dpm(marker, group, tag.h, data,
standardise = TRUE, p = seq(0, 1, l = 101), ci.level = 0.95,
compute.lpml = FALSE, compute.WAIC = FALSE, compute.DIC = FALSE,
pauc = pauccontrol(), density = densitycontrol(),
prior.h = priorcontrol.dpm(), prior.d = priorcontrol.dpm(),
mcmc = mcmccontrol(),
parallel = c("no", "multicore", "snow"), ncpus = 1, cl = NULL)
Arguments
marker |
A character string with the name of the diagnostic test variable. |
group |
A character string with the name of the variable that distinguishes healthy from diseased individuals. |
tag.h |
The value codifying healthy individuals in the variable |
data |
Data frame representing the data and containing all needed variables. |
standardise |
A logical value. If TRUE the test outcomes are standardised (so that the resulting test outcomes have mean zero and standard deviation of one). The default is TRUE. |
p |
Set of false positive fractions (FPF) at which to estimate the pooled ROC curve. This set is also used to compute the area under the ROC curve (AUC) using Simpson's rule. Thus, the length of the set should be an odd number, and it should be rich enough for an accurate estimation. |
ci.level |
An integer value (between 0 and 1) specifying the level for the credible interval. The default is 0.95. |
compute.lpml |
A logical value. If TRUE, the log pseudo marginal likelihood (LPML, Geisser and Eddy, 1979) and the conditional predictive ordinates (CPO) are computed. |
compute.WAIC |
A logical value. If TRUE, the widely applicable information criterion (WAIC, Gelman et al., 2014; Watanabe, 2010) is computed. |
compute.DIC |
A logical value. If TRUE, the deviance information criterion is computed. |
pauc |
A list of control values to replace the default values returned by the function |
density |
A list of control values to replace the default values returned by the function |
prior.h |
Hyparameter specification for the healthy population. A list of control values to replace the default values returned by the function |
prior.d |
Hyparameter specification for the diseased population. A list of control values to replace the default values returned by the function |
mcmc |
A list of control values to replace the default values returned by the function |
parallel |
A characters string with the type of parallel operation: either "no" (default), "multicore" (not available on Windows) or "snow". |
ncpus |
An integer with the number of processes to be used in parallel operation. Defaults to 1. |
cl |
An object inheriting from class |
Details
Estimates the pooled ROC curve (ROC) defined as
ROC(p) = 1 - F_{D}\{F_{\bar{D}}^{-1}(1-p)\},
where
F_{D}(y) = Pr(Y_{D} \leq y),
F_{\bar{D}}(y) = Pr(Y_{\bar{D}} \leq y).
The method implemented in this function estimates F_{D}(\cdot)
and F_{\bar{D}}(\cdot)
by means of a Dirichlet process mixture of normals model. More precisely, and letting \{y_{\bar{D}i}\}_{i=1}^{n_{\bar{D}}}
and \{y_{Dj}\}_{j=1}^{n_{D}}
be two independent random samples from the nondiseased and diseased populations, respectively, the model postulated for the distribution function is as follows
F_{\bar{D}}(y_{\bar{D}i}) = \sum_{l=1}^{L_{\bar{D}}}\omega_{l\bar{D}}\Phi(y_{\bar{D}i}\mid\mu_{l\bar{D}},\sigma_{l\bar{D}}^2),
F_{D}(y_{Dj}) = \sum_{l=1}^{L_{D}}\omega_{lD}\Phi(y_{Dj}\mid\mu_{lD},\sigma_{lD}^2),
where L_{d}
is pre-specified is a pre-specified upper bound on the number of mixture components (d \in \{D, \bar{D}\}
). The \omega_{ld}
's result from a truncated version of the stick-breaking construction (\omega_{1d} = v_{1d}
; \omega_{ld} = v_{ld}\prod_{r<l}(1-v_{dr})
, l=2,\ldots,L_{d}
; v_{d1},\ldots,v_{L_{d}-1}\sim
Beta (1,\alpha_{d})
; v_{Ld} = 1
, \alpha_d \sim \Gamma(a_{\alpha_d},b_{\alpha_d})
), \beta_{ld}\sim N(m_{0d},S_{0d})
, and \sigma_{ld}^{-2}\sim\Gamma(a_{d},b_{d})
.
The area under the curve is
AUC=\int_{0}^{1}ROC(p)dp.
When the upper bound on the number of mixture components is 1, i.e., L_d = 1
(d \in \{D, \bar{D}\}
), there is a closed-form expression for the AUC (binormal model), which is used in the package. In contrast, when L_D > 1
or L_{\bar{D}} > 1
, the AUC is computed using results presented in Erkanli et al. (2006). With regard to the partial area under the curve, when focus = "FPF"
and assuming an upper bound u_1
for the FPF, what it is computed is
pAUC_{FPF}(u_1)=\int_0^{u_1} ROC(p)dp.
As for the AUC, when L_d = 1
(d \in \{D, \bar{D}\}
), there is a closed-form expression for the pAUC_{FPF}
(Hillis and Metz, 2012), and when L_D > 1
or L_{\bar{D}} > 1
the integral is approximated numerically using Simpson's rule. The returned value is the normalised pAUC, pAUC_{FPF}(u_1)/u_1
so that it ranges from u_1/2
(useless test) to 1 (perfect marker). Conversely, when focus = "TPF"
, and assuming a lower bound for the TPF of u_2
, the partial area corresponding to TPFs lying in the interval (u_2,1)
is computed as
pAUC_{TPF}(u_2)=\int_{u_2}^{1}ROC_{TNF}(p)dp,
where ROC_{TNF}(p)
is a 270^\circ
rotation of the ROC curve, and it can be expressed as ROC_{TNF}(p) = F_{\bar{D}}\{F_{D}^{-1}(1-p)\}.
Again, when L_d = 1
(d \in \{D, \bar{D}\}
), there is a closed-form expression for the pAUC_{TNF}
(Hillis and Metz, 2012), and when L_D > 1
or L_{\bar{D}} > 1
the integral is approximated numerically using Simpson's rule. The returned value is the normalised pAUC, pAUC_{TPF}(u_2)/(1-u_2)
, so that it ranges from (1-u_2)/2
(useless test) to 1 (perfect test).
It is worth referring that with respect to the computation of the DIC, when L=1
, it is computed as in Spiegelhalter et al. (2002), and when L>1
, DIC3 as described in Celeux et al. (2006) is computed. Also, for the computation of the conditional predictive ordinates (CPO) we follow the stable version proposed by Gelman et al. (2014).
Value
As a result, the function provides a list with the following components:
call |
The matched call. |
marker |
A list with the diagnostic test outcomes in the healthy (h) and diseased (d) groups. |
missing.ind |
A logical value indicating whether missing values occur. |
p |
Set of false positive fractions (FPF) at which the pooled ROC curve has been estimated. |
ci.level |
The value of the argument |
prior |
A list returning the hyperparameter values in the healthy and diseased populations. |
ROC |
Estimated pooled ROC curve, and corresponding |
AUC |
Estimated pooled AUC, and corresponding |
pAUC |
If computed, estimated partial area under the pooled ROC curve (posterior mean) and |
dens |
Named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: |
lpml |
If computed, named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: the log pseudo marginal likelihood (LPML) and the conditional predictive ordinates (CPO). |
WAIC |
If computed, named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: widely applicable information criterion (WAIC) and associated complexity penalty (pW). |
DIC |
If computed, named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with two components: deviance information criterion (DIC) and associated complexity penalty (pD). |
fit |
Named list of length two, with components 'h' (healthy) and 'd' (diseased). Each component is a list with the following information: (1) |
References
Erkanli, A., Sung M., Jane Costello, E., and Angold, A. (2006). Bayesian semi-parametric ROC analysis. Statistics in Medicine, 25, 3905–3928.
Geisser, S. and Eddy, W.F. (1979) A Predictive Approach to Model Selection, Journal of the American Statistical Association, 74, 153–160.
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B. (2014). Bayesian Data Analysis, 3rd ed. CRC Press: Boca Raton, FL.
Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24, 997–1010.
Hillis, S. L. and Metz, C.E. (2012). An Analytic Expression for the Binormal Partial Area under the ROC Curve. Academic Radiology, 19, 1491–1498.
Watanabe, S. (2010). Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11, 3571–3594.
See Also
AROC.bnp
, AROC.sp
, AROC.kernel
, pooledROC.BB
, pooledROC.emp
, pooledROC.kernel
, pooledROC.dpm
, cROC.bnp
, cROC.sp
or AROC.kernel
.
Examples
library(ROCnReg)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]
# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)
m0_dpm <- pooledROC.dpm(marker = "l_marker1", group = "status",
tag.h = 0, data = newpsa, standardise = TRUE,
p = seq(0,1,l=101), compute.WAIC = TRUE, compute.lpml = TRUE,
compute.DIC = TRUE,
prior.h = priorcontrol.dpm(m0 = 0, S0 = 10, a = 2, b = 0.5, alpha = 1,
L =10),
prior.d = priorcontrol.dpm(m0 = 0, S0 = 10, a = 2, b = 0.5, alpha = 1,
L =10),
mcmc = mcmccontrol(nsave = 400, nburn = 100, nskip = 1))
summary(m0_dpm)
plot(m0_dpm)