R: Semiparametric Bayesian inference of the covariate-adjusted...

AROC.bsp {AROC}

R Documentation

Semiparametric Bayesian inference of the covariate-adjusted ROC curve (AROC).

Description

Estimates the covariate-adjusted ROC curve (AROC) using the semiparametric Bayesian normal linear regression model discussed in Inacio de Carvalho and Rodriguez-Alvarez (2018).

Usage

AROC.bsp(formula.healthy, group, tag.healthy, data, scale = TRUE, 
  p = seq(0, 1, l = 101), paauc = paauccontrol(),
  compute.lpml = FALSE, compute.WAIC = FALSE, 
  m0, S0, nu, Psi, a = 2, b = 0.5, nsim = 5000, nburn = 1500)

Arguments

`formula.healthy`	A `formula` object specifying the Bayesian normal linear regression model for the estimation of the conditional distribution function for the diagnostic test outcome in the healthy population (see Details).
`group`	A character string with the name of the variable that distinguishes healthy from diseased individuals.
`tag.healthy`	The value codifying the healthy individuals in the variable `group`.
`data`	Data frame representing the data and containing all needed variables.
`scale`	A logical value. If TRUE the test outcomes are scaled, i.e., are divided by the standard deviation. The default is TRUE.
`p`	Set of false positive fractions (FPF) at which to estimate the covariate-adjusted ROC curve.
`compute.lpml`	A logical value. If TRUE, the log pseudo marginal likelihood (LPML, Geisser and Eddy, 1979) and the conditional predictive ordinates (CPO) are computed.
`paauc`	A list of control values to replace the default values returned by the function `paauccontrol`. This argument is used to indicate whether the partial area under the covariate-adjusted ROC curve (pAAUC) should be computed and at which FPF.
`compute.WAIC`	A logical value. If TRUE, the widely applicable information criterion (WAIC, Gelman et al., 2014; Watanabe, 2010) is computed.
`m0`	A numeric vector. Hyperparameter; mean vector of the (multivariate) normal distribution for the mean of the regression coefficients. If missing, it is set to a vector of zeros of length `p+1` (see Details).
`S0`	A numeric matrix. Hyperprior. If missing, it is set to a diagonal matrix of dimension `(p+1)`x`(p+1)` with 100 in the diagonal (see Details).
`nu`	A numeric value. Hyperparameter; degrees of freedom of the Wishart distribution for the precision matrix of the regression coefficients. If missing, it is set to `p + 3` (see Details)
`Psi`	A numeric matrix. Hyperparameter; scale matrix of the Wishart distribution for the precision matrix of the regression coefficients. If missing, it is set to an identity matrix of dimension `(p+1)`x`(p+1)` (see Details).
`a`	A numeric value. Hyperparameter; shape parameter of the gamma distribution for the precision (inverse variance). The default is 2 (scaled data) (see Details).
`b`	A numeric value. Hyperparameter; rate parameter of the gamma distribution for the precision (inverse variance). The default is 0.5 (scaled data) (see Details).
`nsim`	A numeric value. Total number of Gibbs sampler iterates (including the burn-in). The default is 5000.
`nburn`	A numeric value. Number of burn-in iterations. The default is 1500.

Details

Estimates the covariate-adjusted ROC curve (AROC) defined as

AROC\left(t\right) = Pr\{1 - F_{\bar{D}}(Y_D | \mathbf{X}_{D}) \leq t\},

where F_{\bar{D}}(\cdot|\mathbf{X}_{\bar{D}}) denotes the conditional distribution function for Y_{\bar{D}} conditional on the vector of covariates X_{\bar{D}}. In particular, the method implemented in this function combines a Bayesian normal linear regression model to estimate F_{\bar{D}}(\cdot|\mathbf{X}_{\bar{D}}) and the Bayesian bootstrap (Rubin, 1981) to estimate the outside probability. More precisely, and letting \{(\mathbf{x}_{\bar{D}i},y_{\bar{D}i})\}_{i=1}^{n_{\bar{D}}} be a random sample from the nondiseased population

F_{\bar{D}}(y_{\bar{D}i}|\mathbf{X}_{\bar{D}}=\mathbf{x}_{\bar{D}i}) = \Phi(y_{\bar{D}i}\mid \mathbf{x}_{\bar{D}i}^{*T}\mathbf{\beta}^{*},\sigma^2),

where \mathbf{x}_{\bar{D}i}^{*T} = (1, \mathbf{x}_{\bar{D}i}^{T}), \mathbf{\beta}^{*}\sim N_{p+1} (\mathbf{m},\mathbf{S}) and \sigma^{-2}\sim\Gamma(a,b). It is assumed that \mathbf{m} \sim N_{p+1}(\mathbf{m}_0,\mathbf{S}_0) and \mathbf{S}^{-1}\sim W(\nu,(\nu\Psi)^{-1}), where p+1 denotes the number of columns of the design matrix \mathbf{X}_{\bar{D}}^{*}. Here W(\nu,(\nu\Psi)^{-1}) denotes a Wishart distribution with \nu degrees of freedom and expectation \Psi^{-1}. For a detailed description, we refer to Inacio de Carvalho and Rodriguez-Alvarez (2018).

Value

As a result, the function provides a list with the following components:

`call`	The matched call.
`p`	Set of false positive fractions (FPF) at which the pooled ROC curve has been estimated.
`ROC`	Estimated covariate-adjusted ROC curve (AROC) (posterior mean), and 95% pointwise posterior credible band.
`AUC`	Estimated area under the covariate-adjusted ROC curve (AAUC) (posterior mean), and 95% pointwise posterior credible band.
`pAUC`	If required in the call to the function, estimated partial area under the covariate-adjusted ROC curve (pAAUC) (posterior mean), and 95% pointwise posterior credible band.
`lpml`	If required, list with two components: the log pseudo marginal likelihood (LPML) and the conditional predictive ordinates (CPO).
`WAIC`	If required, widely applicable information criterion (WAIC).
`fit`	Results of the fitting process. It is a list with the following components: (1) `mm`: information needed to construct the model matrix associated with the B-splines dependent Dirichlet process mixture model. (2) `beta`: matrix of dimension `N`x`p+1` with the sampled regression coefficients. Here, `N` is the number of Gibbs sampler iterates after burn-in, and `p+1` the number of columns of the design matrix (see also Details). (3) `sd`: vector of length `N` with the sampled variances (see also Details).
`data_model`	List with the data used in the fit: observed diagnostic test outcome and B-spline design matrices, separately for the healthy and diseased groups.

References

Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2018). Bayesian nonparametric inference for the covariate-adjusted ROC curve. arXiv preprint arXiv:1806.00473.

Rubin, D. B. (1981). The Bayesian bootstrap. The Annals of Statistics, 9(1), 130-134.

Examples

library(AROC)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]

# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)

m1 <- AROC.bsp(formula.healthy = l_marker1 ~ age,
group = "status", tag.healthy = 0, data = newpsa, scale = TRUE,
p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE,
a = 2, b = 0.5, nsim = 5000, nburn = 1500)

summary(m1)

plot(m1)

[Package AROC version 1.0-4 Index]