R: Estimates of CVAUC

cv_auc {nlpred}

R Documentation

Estimates of CVAUC

Description

This function computes K-fold cross-validated estimates of the area under the receiver operating characteristics (ROC) curve (hereafter, AUC). This quantity can be interpreted as the probability that a randomly selected case will have higher predicted risk than a randomly selected control.

Usage

cv_auc(
  Y,
  X,
  K = 10,
  learner = "glm_wrapper",
  nested_cv = TRUE,
  nested_K = K - 1,
  parallel = FALSE,
  max_cvtmle_iter = 10,
  cvtmle_ictol = 1/length(Y),
  prediction_list = NULL,
  ...
)

Arguments

`Y`	A numeric vector of outcomes, assume to equal `0` or `1`.
`X`	A `data.frame` or `matrix` of variables for prediction.
`K`	The number of cross-validation folds (default is `10`).
`learner`	A wrapper that implements the desired method for building a prediction algorithm. See See `?glm_wrapper` or read the package vignette for more information on formatting `learner`s.
`nested_cv`	A boolean indicating whether nested cross validation should be used to estimate the distribution of the prediction function. Default (`TRUE`) is best choice for aggressive `learner`'s, while `FALSE` is reasonable for smooth `learner`'s (e.g., logistic regression).
`nested_K`	If nested cross validation is used, how many inner folds should there be? Default (`K-1`) affords quicker computation by reusing training fold learner fits.
`parallel`	A boolean indicating whether prediction algorithms should be trained in parallel. Default to `FALSE`.
`max_cvtmle_iter`	Maximum number of iterations for the bias correction step of the CV-TMLE estimator (default `10`).
`cvtmle_ictol`	The CV-TMLE will iterate `max_cvtmle_iter` is reached or mean of cross-validated efficient influence function is less than `cvtmle_ictol`.
`prediction_list`	For power users: a list of predictions made by `learner` that has a format compatible with `cvauc`.
`...`	Other arguments, not currently used

Details

To estimate the AUC of a particular prediction algorithm, K-fold cross-validation is commonly used: data are partitioned into K distinct groups and the prediction algorithm is developed using K-1 of these groups. In standard K-fold cross-validation, the AUC of this prediction algorithm is estimated using the remaining fold. This can be problematic when the number of observations is small or the number of cross-validation folds is large.

Here, we estimate relevant nuisance parameters in the training sample and use the validation sample to perform some form of bias correction – either through cross-validated targeted minimum loss-based estimation, estimating equations, or one-step estimation. When aggressive learning algorithms are applied, it is necessary to use an additional layer of cross-validation in the training sample to estimate the nuisance parameters. This is controlled via the nested_cv option below.

Value

An object of class "cvauc".

est_cvtmle: cross-validated targeted minimum loss-based estimator of K-fold CV AUC
iter_cvtmle: iterations needed to achieve convergence of CVTMLE algorithm
cvtmle_trace: the value of the CVTMLE at each iteration of the targeting algorithm
se_cvtmle: estimated standard error based on targeted nuisance parameters
est_init: plug-in estimate of CV AUC where nuisance parameters are estimated in the training sample
est_empirical: the standard K-fold CV AUC estimator
se_empirical: estimated standard error for the standard estimator
est_onestep: cross-validated one-step estimate of K-fold CV AUC
se_onestep: estimated standard error for the one-step estimator
est_esteq: cross-validated estimating equations estimate of K-fold CV AUC
se_esteq: estimated standard error for the estimating equations estimator (same as for one-step)
folds: list of observation indexes in each validation fold
ic_cvtmle: influence function evaluated at the targeted nuisance parameter estimates
ic_onestep: influence function evaluated at the training-fold-estimated nuisance parameters
ic_esteq: influence function evaluated at the training-fold-estimated nuisance parameters
ic_empirical: influence function evaluated at the validation-fold estimated nuisance parameters
prediction_list: a list of output from the cross-validated model training; see the individual wrapper function documentation for further details

Examples

# simulate data
n <- 200
p <- 10
X <- data.frame(matrix(rnorm(n*p), nrow = n, ncol = p))
Y <- rbinom(n, 1, plogis(X[,1] + X[,10]))

# get cv auc estimates for logistic regression
cv_auc_ests <- cv_auc(Y = Y, X = X, K = 5, learner = "glm_wrapper")

# get cv auc estimates for random forest
# using nested cross-validation for nuisance parameter estimation

fit <- cv_auc(Y = Y, X = X, K = 5, 
              learner = "randomforest_wrapper", 
              nested_cv = TRUE)

[Package nlpred version 1.0.1 Index]