AROC.bnp {AROC}R Documentation

Nonparametric Bayesian inference of the covariate-adjusted ROC curve (AROC).

Description

Estimates the covariate-adjusted ROC curve (AROC) using the nonparametric Bayesian approach proposed by Inacio de Carvalho and Rodriguez-Alvarez (2018).

Usage

AROC.bnp(formula.healthy, group, tag.healthy, data, scale = TRUE, 
  p = seq(0, 1, l = 101), paauc = paauccontrol(), 
  compute.lpml = FALSE, compute.WAIC = FALSE, 
  m0, S0, nu, Psi, alpha = 1, a = 2, b = 0.5, L = 10, nsim = 10000, nburn = 2000)

Arguments

formula.healthy

A formula object specifying the B-splines dependent Dirichlet process mixture model for the estimation of the conditional distribution function for the diagnostic test outcome in the healthy population (see Note).

group

A character string with the name of the variable that distinguishes healthy from diseased individuals.

tag.healthy

The value codifying the healthy individuals in the variable group.

data

Data frame representing the data and containing all needed variables.

scale

A logical value. If TRUE the test outcomes are scaled, i.e., are divided by the standard deviation. The default is TRUE.

p

Set of false positive fractions (FPF) at which to estimate the covariate-adjusted ROC curve.

paauc

A list of control values to replace the default values returned by the function paauccontrol. This argument is used to indicate whether the partial area under the covariate-adjusted ROC curve (pAAUC) should be computed and at which FPF.

compute.lpml

A logical value. If TRUE, the log pseudo marginal likelihood (LPML, Geisser and Eddy, 1979) and the conditional predictive ordinates (CPO) are computed.

compute.WAIC

A logical value. If TRUE, the widely applicable information criterion (WAIC, Gelman et al., 2014; Watanabe, 2010) is computed.

m0

A numeric vector. Hyperparameter; mean vector of the (multivariate) normal prior distribution for the mean of the normal component of the centering distribution. If missing, it is set to a vector of zeros of length Q (see Details).

S0

A numeric matrix. Hyperparameter; covariance matrix of the (multivariate) normal prior distribution for the mean of the normal component of the centering distribution. If missing, it is set to a diagonal matrix of dimension QxQ with 100 in the diagonal (see Details).

nu

A numeric value. Hyperparameter; degrees of freedom of the Wishart prior distribution for the precision matrix of the the normal component of the centering distribution. If missing, it is set to Q + 2 (see Details)

Psi

A numeric matrix. Hyperparameter; scale matrix of the Wishart distribution for the precision matrix of the the normal component of the centering distribution. If missing, it is set to an identity matrix of dimension QxQ (see Details).

alpha

A numeric value. Precision parameter of the Dirichlet Process. The default is 1 (see Details).

a

A numeric value. Hyperparameter; shape parameter of the gamma prior distribution for the precision (inverse variance). The default is 2 (scaled data) (see Details).

b

A numeric value. Hyperparameter; rate parameter of the gamma prior distribution for the precision (inverse variance). The default is 0.5 (scaled data) (see Details).

L

A numeric value. Maximum number of mixture components for the B-splines dependent Dirichlet process mixture model. The default is 10 (see Details)

nsim

A numeric value. Total number of Gibbs sampler iterates (including the burn-in). The default is 10000.

nburn

A numeric value. Number of burn-in iterations. The default is 2000.

Details

Estimates the covariate-adjusted ROC curve (AROC) defined as

AROC≤ft(t\right) = Pr\{1 - F_{\bar{D}}(Y_D | \mathbf{X}_{D}) ≤q t\},

where F_{\bar{D}}(\cdot|\mathbf{X}_{\bar{D}}) denotes the conditional distribution function for Y_{\bar{D}} conditional on the vector of covariates \mathbf{X}_{\bar{D}}. In particular, the method implemented in this function combines a B-splines dependent Dirichlet process mixture model to estimate F_{\bar{D}}(\cdot|\mathbf{X}_{\bar{D}}) and the Bayesian bootstrap (Rubin, 1981) to estimate the outside probability. More precisely, and letting \{(\mathbf{x}_{\bar{D}i},y_{\bar{D}i})\}_{i=1}^{n_{\bar{D}}} be a random sample from the nondiseased population

F_{\bar{D}}(y_{\bar{D}i}|\mathbf{X}_{\bar{D}}=\mathbf{x}_{\bar{D}i}) = ∑_{l=1}^{L}ω_lΦ(y_{\bar{D}i}\midμ_{l}(\mathbf{x}_{\bar{D}i}),σ_l^2),

where μ_{l}(\mathbf{x}_{\bar{D}i}) = \mathbf{z}_{\bar{D}i}^{T}\mathbf{β}_l and L is pre-specified (maximum number of mixture components). The ω_l's result from a truncated version of the stick-breaking construction (ω_1=v_1; ω_l=v_l∏_{r<l}(1-v_r), l=2,…,L; v_1,…,v_{L-1}\sim Beta (1,α); v_L=1), \mathbf{β}_l\sim N_{Q}(\mathbf{m},\mathbf{S}), and σ_l^{-2}\simΓ(a,b). It is assumed that \mathbf{m} \sim N_{Q}(\mathbf{m}_0,\mathbf{S}_0) and \mathbf{S}^{-1}\sim W(ν,(νΨ)^{-1}). Here W(ν,(νΨ)^{-1}) denotes a Wishart distribution with ν degrees of freedom and expectation Ψ^{-1}, and Q denotes the dimension of vector \mathbf{z}_{\bar{D}i}. For a detailed description, we refer to Inacio de Carvalho and Rodriguez-Alvarez (2018).

Value

As a result, the function provides a list with the following components:

call

The matched call.

p

Set of false positive fractions (FPF) at which the pooled ROC curve has been estimated.

ROC

Estimated covariate-adjusted ROC curve (AROC) (posterior mean), and 95% pointwise posterior credible band.

AUC

Estimated area under the covariate-adjusted ROC curve (AAUC) (posterior mean), and 95% pointwise posterior credible band.

pAUC

If required, estimated partial area under the covariate-adjusted ROC curve (pAAUC) (posterior mean), and 95% pointwise posterior credible band.

lpml

If required, list with two components: the log pseudo marginal likelihood (LPML) and the conditional predictive ordinates (CPO).

WAIC

If required, widely applicable information criterion (WAIC).

fit

Results of the fitting process. It is a list with the following components: (1) mm: information needed to construct the model matrix associated with the B-splines dependent Dirichlet process mixture model. (2) beta: array of dimension NxLxQ with the sampled regression coefficients. Here, N is the number of Gibbs sampler iterates after burn-in, L is the maximum number of mixture components, and Q is the dimension of vector \mathbf{Z}_{\bar{D}} (see also Details). (3) sd: matrix of dimension NxL with the sampled variances. Here, N is the number of Gibbs sampler iterates after burn-in, and L is the maximum number of mixture components (see also Details). (4) probs: matrix of dimension NxL with the sampled components' weights. Here, N is the number of Gibbs sampler iterates after burn-in and L is the maximum number of mixture components (see also Details).

data_model

List with the data used in the fit: observed diagnostic test outcome and B-spline design matrices, separately for the healthy and diseased groups.

Note

The input argument formula.healthy is similar to that used for the glm function, except that flexible specifications can be added by means of function f(). For instance, specification y \sim x1 + f(x2, K = 3) would assume a linear effect of x1 and the effect of x2 would be modeled using B-splines basis functions. The argument K = 3 indicates that 3 internal knots will be used, with the quantiles of x2 used for their location. Categorical variables (factors) can be also incorporated, as well as factor-by-curve interaction terms. For example, to include the interaction between age and gender we need to specify y \sim gender + f(age, by = gender, K = 3).

References

Inacio de Carvalho, V., and Rodriguez-Alvarez, M. X. (2018). Bayesian nonparametric inference for the covariate-adjusted ROC curve. arXiv preprint arXiv:1806.00473.

Rubin, D. B. (1981). The Bayesian bootstrap. The Annals of Statistics, 9(1), 130-134.

See Also

AROC.bnp, AROC.bsp, AROC.sp, AROC.kernel, pooledROC.BB or pooledROC.emp.

Examples

library(AROC)
data(psa)
# Select the last measurement
newpsa <- psa[!duplicated(psa$id, fromLast = TRUE),]

# Log-transform the biomarker
newpsa$l_marker1 <- log(newpsa$marker1)

m0 <- AROC.bnp(formula.healthy = l_marker1 ~ f(age, K = 0),
group = "status", tag.healthy = 0, data = newpsa, scale = TRUE,
p = seq(0,1,l=101), compute.lpml = TRUE, compute.WAIC = TRUE,
a = 2, b = 0.5, L = 10, nsim = 5000, nburn = 1000)

summary(m0)

plot(m0)



[Package AROC version 1.0-3 Index]