ci.pooled.cvAUC {cvAUC} | R Documentation |
Confidence Intervals for Cross-validated Area Under the ROC Curve (AUC) Estimates for Pooled Repeated Measures Data
Description
This function calculates influence curve based confidence intervals for cross-validated area under the curve (AUC) estimates, for a pooled repeated measures data set.
Usage
ci.pooled.cvAUC(predictions, labels, label.ordering = NULL,
folds = NULL, ids, confidence = 0.95)
Arguments
predictions |
A vector, matrix, list, or data frame containing the predictions. |
labels |
A vector, matrix, list, or data frame containing the true class labels. Must have the same dimensions as |
label.ordering |
The default ordering of the classes can be changed by supplying a vector containing the negative and the positive class label (negative label first, positive label second). |
folds |
If specified, this must be a vector of fold ids equal in length to |
ids |
A vector, matrix, list, or data frame containing cluster or entity ids. All observations from the same entity (i.e. patient) that have been pooled must have the same id. Must have the same dimensions as 'predictions'. |
confidence |
A number between 0 and 1 that represents confidence level. |
Details
See the documentation for the prediction
function in the ROCR package for details on the predictions
, labels
and label.ordering
arguments.
In pooled repeated measures data, the clusters (not the individual observations) are the independent units. Each observation has a corresponding binary outcome. This data structure arises often in clinical studies where each patient is measured, and an outcome is recorded, at various time points. Then the observations from all patients are pooled together. See the Examples section below for more information.
Value
A list containing the following named elements:
cvAUC |
Cross-validated area under the curve estimate. |
se |
Standard error. |
ci |
A vector of length two containing the upper and lower bounds for the confidence interval. |
confidence |
A number between 0 and 1 representing the confidence. |
Author(s)
Erin LeDell oss@ledell.org
Maya Petersen mayaliv@berkeley.edu
Mark van der Laan laan@berkeley.edu
References
LeDell, Erin; Petersen, Maya; van der Laan, Mark. Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates. Electron. J. Statist. 9 (2015), no. 1, 1583–1607. doi:10.1214/15-EJS1035. http://projecteuclid.org/euclid.ejs/1437742107.
M. J. van der Laan and S. Rose. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. Springer, first edition, 2011.
Tobias Sing, Oliver Sander, Niko Beerenwinkel, and Thomas Lengauer. ROCR: Visualizing classifier performance in R. Bioinformatics, 21(20):3940-3941, 2005.
See Also
prediction
, performance
,
cvAUC
, ci.cvAUC
Examples
# This example is similar to the ci.cvAUC example, with the excpection that
# this is a pooled repeated measures data set. The example uses simulated
# data that contains multiple time point observations for 500 patients,
# each observation having a binary outcome.
#
# The cross-validation folds are stratified by ids that have at least one
# positive outcome. All observations belonging to one patient are
# contained within the save CV fold.
pooled_example <- function(data, ids, V = 10){
.cvFolds <- function(Y, V, ids){
#Stratify by outcome & id
classes <- tapply(1:length(Y), INDEX = Y, FUN = split, 1)
ids.Y1 <- unique(ids[classes$`1`]) #ids that contain an observation with Y==1
ids.noY1 <- setdiff(unique(ids), ids.Y1) #ids that have no Y==1 obvervations
ids.Y1.split <- split(sample(length(ids.Y1)), rep(1:V, length = length(ids.Y1)))
ids.noY1.split <- split(sample(length(ids.noY1)), rep(1:V, length = length(ids.noY1)))
folds <- vector("list", V)
for (v in seq(V)){
idx.Y1 <- which(ids %in% ids.Y1[ids.Y1.split[[v]]])
idx.noY1 <- which(ids %in% ids.noY1[ids.noY1.split[[v]]])
folds[[v]] <- c(idx.Y1, idx.noY1)
}
return(folds)
}
.doFit <- function(v, folds, data){ #Train/test glm for each fold
fit <- glm(Y~., data = data[-folds[[v]],], family = binomial)
pred <- predict(fit, newdata = data[folds[[v]],], type = "response")
return(pred)
}
folds <- .cvFolds(Y = data$Y, ids = ids, V = V) #Create folds
predictions <- unlist(sapply(seq(V), .doFit, folds = folds, data = data)) #CV train/predict
predictions[unlist(folds)] <- predictions #Re-order fold indices
out <- ci.pooled.cvAUC(predictions = predictions, labels = data$Y,
folds = folds, ids = ids, confidence = 0.95)
return(out)
}
# Load data
library(cvAUC)
data(adherence)
# Get performance
set.seed(1)
out <- pooled_example(data = subset(adherence, select=-c(id)),
ids = adherence$id, V = 10)
# The output is given as follows:
# > out
# $cvAUC
# [1] 0.8648046
#
# $se
# [1] 0.01551888
#
# $ci
# [1] 0.8343882 0.8952211
#
# $confidence
# [1] 0.95