cv_auc {nlpred} | R Documentation |
Estimates of CVAUC
Description
This function computes K-fold cross-validated estimates of the area under the receiver operating characteristics (ROC) curve (hereafter, AUC). This quantity can be interpreted as the probability that a randomly selected case will have higher predicted risk than a randomly selected control.
Usage
cv_auc(
Y,
X,
K = 10,
learner = "glm_wrapper",
nested_cv = TRUE,
nested_K = K - 1,
parallel = FALSE,
max_cvtmle_iter = 10,
cvtmle_ictol = 1/length(Y),
prediction_list = NULL,
...
)
Arguments
Y |
A numeric vector of outcomes, assume to equal |
X |
A |
K |
The number of cross-validation folds (default is |
learner |
A wrapper that implements the desired method for building a
prediction algorithm. See See |
nested_cv |
A boolean indicating whether nested cross validation should
be used to estimate the distribution of the prediction function. Default ( |
nested_K |
If nested cross validation is used, how many inner folds should
there be? Default ( |
parallel |
A boolean indicating whether prediction algorithms should be
trained in parallel. Default to |
max_cvtmle_iter |
Maximum number of iterations for the bias correction
step of the CV-TMLE estimator (default |
cvtmle_ictol |
The CV-TMLE will iterate |
prediction_list |
For power users: a list of predictions made by |
... |
Other arguments, not currently used |
Details
To estimate the AUC of a particular prediction algorithm, K-fold cross-validation is commonly used: data are partitioned into K distinct groups and the prediction algorithm is developed using K-1 of these groups. In standard K-fold cross-validation, the AUC of this prediction algorithm is estimated using the remaining fold. This can be problematic when the number of observations is small or the number of cross-validation folds is large.
Here, we estimate relevant nuisance parameters in the training sample and use
the validation sample to perform some form of bias correction – either through
cross-validated targeted minimum loss-based estimation, estimating equations,
or one-step estimation. When aggressive learning algorithms are applied, it is
necessary to use an additional layer of cross-validation in the training sample
to estimate the nuisance parameters. This is controlled via the nested_cv
option below.
Value
An object of class "cvauc"
.
est_cvtmle
cross-validated targeted minimum loss-based estimator of K-fold CV AUC
iter_cvtmle
iterations needed to achieve convergence of CVTMLE algorithm
cvtmle_trace
the value of the CVTMLE at each iteration of the targeting algorithm
se_cvtmle
estimated standard error based on targeted nuisance parameters
est_init
plug-in estimate of CV AUC where nuisance parameters are estimated in the training sample
est_empirical
the standard K-fold CV AUC estimator
se_empirical
estimated standard error for the standard estimator
est_onestep
cross-validated one-step estimate of K-fold CV AUC
se_onestep
estimated standard error for the one-step estimator
est_esteq
cross-validated estimating equations estimate of K-fold CV AUC
se_esteq
estimated standard error for the estimating equations estimator (same as for one-step)
folds
list of observation indexes in each validation fold
ic_cvtmle
influence function evaluated at the targeted nuisance parameter estimates
ic_onestep
influence function evaluated at the training-fold-estimated nuisance parameters
ic_esteq
influence function evaluated at the training-fold-estimated nuisance parameters
ic_empirical
influence function evaluated at the validation-fold estimated nuisance parameters
prediction_list
a list of output from the cross-validated model training; see the individual wrapper function documentation for further details
Examples
# simulate data
n <- 200
p <- 10
X <- data.frame(matrix(rnorm(n*p), nrow = n, ncol = p))
Y <- rbinom(n, 1, plogis(X[,1] + X[,10]))
# get cv auc estimates for logistic regression
cv_auc_ests <- cv_auc(Y = Y, X = X, K = 5, learner = "glm_wrapper")
# get cv auc estimates for random forest
# using nested cross-validation for nuisance parameter estimation
fit <- cv_auc(Y = Y, X = X, K = 5,
learner = "randomforest_wrapper",
nested_cv = TRUE)