R: Nonparametric Bayesian inference of the covariate-adjusted...

AROC.bnp {AROC}

R Documentation

Nonparametric Bayesian inference of the covariate-adjusted ROC curve (AROC).

Description

Estimates the covariate-adjusted ROC curve (AROC) using the nonparametric Bayesian approach proposed by Inacio de Carvalho and Rodriguez-Alvarez (2018).

Usage

AROC.bnp(formula.healthy, group, tag.healthy, data, scale = TRUE, 
  p = seq(0, 1, l = 101), paauc = paauccontrol(), 
  compute.lpml = FALSE, compute.WAIC = FALSE, 
  m0, S0, nu, Psi, alpha = 1, a = 2, b = 0.5, L = 10, nsim = 10000, nburn = 2000)

Arguments

`formula.healthy`	A `formula` object specifying the B-splines dependent Dirichlet process mixture model for the estimation of the conditional distribution function for the diagnostic test outcome in the healthy population (see Note).
`group`	A character string with the name of the variable that distinguishes healthy from diseased individuals.
`tag.healthy`	The value codifying the healthy individuals in the variable `group`.
`data`	Data frame representing the data and containing all needed variables.
`scale`	A logical value. If TRUE the test outcomes are scaled, i.e., are divided by the standard deviation. The default is TRUE.
`p`	Set of false positive fractions (FPF) at which to estimate the covariate-adjusted ROC curve.
`paauc`	A list of control values to replace the default values returned by the function `paauccontrol`. This argument is used to indicate whether the partial area under the covariate-adjusted ROC curve (pAAUC) should be computed and at which FPF.
`compute.lpml`	A logical value. If TRUE, the log pseudo marginal likelihood (LPML, Geisser and Eddy, 1979) and the conditional predictive ordinates (CPO) are computed.
`compute.WAIC`	A logical value. If TRUE, the widely applicable information criterion (WAIC, Gelman et al., 2014; Watanabe, 2010) is computed.
`m0`	A numeric vector. Hyperparameter; mean vector of the (multivariate) normal prior distribution for the mean of the normal component of the centering distribution. If missing, it is set to a vector of zeros of length `Q` (see Details).
`S0`	A numeric matrix. Hyperparameter; covariance matrix of the (multivariate) normal prior distribution for the mean of the normal component of the centering distribution. If missing, it is set to a diagonal matrix of dimension `Q`x`Q` with 100 in the diagonal (see Details).
`nu`	A numeric value. Hyperparameter; degrees of freedom of the Wishart prior distribution for the precision matrix of the the normal component of the centering distribution. If missing, it is set to `Q + 2` (see Details)
`Psi`	A numeric matrix. Hyperparameter; scale matrix of the Wishart distribution for the precision matrix of the the normal component of the centering distribution. If missing, it is set to an identity matrix of dimension `Q`x`Q` (see Details).
`alpha`	A numeric value. Precision parameter of the Dirichlet Process. The default is 1 (see Details).
`a`	A numeric value. Hyperparameter; shape parameter of the gamma prior distribution for the precision (inverse variance). The default is 2 (scaled data) (see Details).
`b`	A numeric value. Hyperparameter; rate parameter of the gamma prior distribution for the precision (inverse variance). The default is 0.5 (scaled data) (see Details).
`L`	A numeric value. Maximum number of mixture components for the B-splines dependent Dirichlet process mixture model. The default is 10 (see Details)
`nsim`	A numeric value. Total number of Gibbs sampler iterates (including the burn-in). The default is 10000.
`nburn`	A numeric value. Number of burn-in iterations. The default is 2000.

Details

Estimates the covariate-adjusted ROC curve (AROC) defined as

AROC\left(t\right) = Pr\{1 - F_{\bar{D}}(Y_D | \mathbf{X}_{D}) \leq t\},

where F_{\bar{D}}(\cdot|\mathbf{X}_{\bar{D}}) denotes the conditional distribution function for Y_{\bar{D}} conditional on the vector of covariates \mathbf{X}_{\bar{D}}. In particular, the method implemented in this function combines a B-splines dependent Dirichlet process mixture model to estimate F_{\bar{D}}(\cdot|\mathbf{X}_{\bar{D}}) and the Bayesian bootstrap (Rubin, 1981) to estimate the outside probability. More precisely, and letting \{(\mathbf{x}_{\bar{D}i},y_{\bar{D}i})\}_{i=1}^{n_{\bar{D}}} be a random sample from the nondiseased population

F_{\bar{D}}(y_{\bar{D}i}|\mathbf{X}_{\bar{D}}=\mathbf{x}_{\bar{D}i}) = \sum_{l=1}^{L}\omega_l\Phi(y_{\bar{D}i}\mid\mu_{l}(\mathbf{x}_{\bar{D}i}),\sigma_l^2),

where \mu_{l}(\mathbf{x}_{\bar{D}i}) = \mathbf{z}_{\bar{D}i}^{T}\mathbf{\beta}_l and L is pre-specified (maximum number of mixture components). The \omega_l's result from a truncated version of the stick-breaking construction (\omega_1=v_1; \omega_l=v_l\prod_{r<l}(1-v_r), l=2,\ldots,L; v_1,\ldots,v_{L-1}\sim Beta (1,\alpha); v_L=1), \mathbf{\beta}_l\sim N_{Q}(\mathbf{m},\mathbf{S}), and \sigma_l^{-2}\sim\Gamma(a,b). It is assumed that \mathbf{m} \sim N_{Q}(\mathbf{m}_0,\mathbf{S}_0) and \mathbf{S}^{-1}\sim W(\nu,(\nu\Psi)^{-1}). Here W(\nu,(\nu\Psi)^{-1}) denotes a Wishart distribution with \nu degrees of freedom and expectation \Psi^{-1}, and Q denotes the dimension of vector \mathbf{z}_{\bar{D}i}. For a detailed description, we refer to Inacio de Carvalho and Rodriguez-Alvarez (2018).

Value

As a result, the function provides a list with the following components:

`call`	The matched call.
`p`	Set of false positive fractions (FPF) at which the pooled ROC curve has been estimated.
`ROC`	Estimated covariate-adjusted ROC curve (AROC) (posterior mean), and 95% pointwise posterior credible band.
`AUC`	Estimated area under the covariate-adjusted ROC curve (AAUC) (posterior mean), and 95% pointwise posterior credible band.
`pAUC`	If required, estimated partial area under the covariate-adjusted ROC curve (pAAUC) (posterior mean), and 95% pointwise posterior credible band.
`lpml`	If required, list with two components: the log pseudo marginal likelihood (LPML) and the conditional predictive ordinates (CPO).
`WAIC`	If required, widely applicable information criterion (WAIC).
`fit`	Results of the fitting process. It is a list with the following components: (1) `mm`: information needed to construct the model matrix associated with the B-splines dependent Dirichlet process mixture model. (2) `beta`: array of dimension `N`x`L`x`Q` with the sampled regression coefficients. Here, `N` is the number of Gibbs sampler iterates after burn-in, `L` is the maximum number of mixture components, and `Q` is the dimension of vector `\mathbf{Z}_{\bar{D}}` (see also Details). (3) `sd`: matrix of dimension `N`x`L` with the sampled variances. Here, `N` is the number of Gibbs sampler iterates after burn-in, and `L` is the maximum number of mixture components (see also Details). (4) `probs`: matrix of dimension `N`x`L` with the sampled components' weights. Here, `N` is the number of Gibbs sampler iterates after burn-in and `L` is the maximum number of mixture components (see also Details).
`data_model`	List with the data used in the fit: observed diagnostic test outcome and B-spline design matrices, separately for the healthy and diseased groups.

Note

The input argument formula.healthy is similar to that used for the glm function, except that flexible specifications can be added by means of function f(). For instance, specification y \sim x1 + f(x2, K = 3) would assume a linear effect of x1 and the effect of x2 would be modeled using B-splines basis functions. The argument K = 3 indicates that 3 internal knots will be used, with the quantiles of x2 used for their location. Categorical variables (factors) can be also incorporated, as well as factor-by-curve interaction terms. For example, to include the interaction between age and gender we need to specify y \sim gender + f(age, by = gender, K = 3).

References

Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2018). Bayesian nonparametric inference for the covariate-adjusted ROC curve. arXiv preprint arXiv:1806.00473.

Rubin, D. B. (1981). The Bayesian bootstrap. The Annals of Statistics, 9(1), 130-134.

Examples

library(AROC)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]

# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)

m0 <- AROC.bnp(formula.healthy = l_marker1 ~ f(age, K = 0),
group = "status", tag.healthy = 0, data = newpsa, scale = TRUE,
p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE,
a = 2, b = 0.5, L = 10, nsim = 5000, nburn = 1000)

summary(m0)

plot(m0)

[Package AROC version 1.0-4 Index]