Score {riskRegression} | R Documentation |
Score risk predictions
Description
Methods to score the predictive performance of risk markers and risk prediction models
Usage
Score(object, ...)
## S3 method for class 'list'
Score(
object,
formula,
data,
metrics = c("auc", "brier"),
summary = NULL,
plots = NULL,
cause,
times,
landmarks,
use.event.times = FALSE,
null.model = TRUE,
se.fit = TRUE,
conservative = FALSE,
multi.split.test = FALSE,
conf.int = 0.95,
contrasts = TRUE,
probs = c(0, 0.25, 0.5, 0.75, 1),
cens.method = "ipcw",
cens.model = "cox",
split.method,
B,
M,
seed,
trainseeds,
parallel = c("no", "multicore", "snow", "as.registered"),
ncpus = 1,
cl = NULL,
progress.bar = 3,
errorhandling = "pass",
keep,
predictRisk.args,
debug = 0L,
censoring.save.memory = FALSE,
breaks = seq(0, 1, 0.01),
roc.method = "vertical",
roc.grid = switch(roc.method, vertical = seq(0, 1, 0.01), horizontal = seq(1, 0,
-0.01)),
cutpoints = NULL,
...
)
Arguments
object |
List of risk predictions (see details and examples). |
... |
Named list containing additional arguments that are passed on to the |
formula |
A formula which identifies the outcome (left hand
side). E.g., |
data |
|
metrics |
Character vector specifying which metrics to
apply. Case does not matter. Choices are |
summary |
Character vector specifying which summary
statistics to apply to the predicted risks. Choices are
Set to |
plots |
Character vector specifying for which plots to put data into the result.
Currently implemented are |
cause |
Event of interest. Used for binary outcome |
times |
For survival and competing risks outcome: list of prediction horizons. All times which are greater than the maximal observed time in the data set are automatically removed. Note that the object returned by the function may become huge when the prediction performance is estimated at many prediction horizons. |
landmarks |
Not yet implemented. |
use.event.times |
If |
null.model |
If |
se.fit |
Logical or |
conservative |
Logical, only relevant in right censored data. If |
multi.split.test |
Logical or |
conf.int |
Either logical or a numeric value between 0 and 1. In right censored data,
confidence intervals are based on Blanche et al (see references). Setting |
contrasts |
Either logical or a list of contrasts. A list of contrasts defines which risk prediction models (markers)
should be contrasted with respect to their prediction performance.
If |
probs |
Quantiles for retrospective summary statistics of the
predicted risks. This affects the result of the function |
cens.method |
Method for dealing with right censored
data. Either |
cens.model |
Model for estimating inverse probability of
censored weights (IPCW). Implemented are the Kaplan-Meier method ( |
split.method |
Method for cross-validation. Right now the only choices are |
B |
Number of bootstrap sets for cross-validation. |
M |
Size of subsamples for bootstrap cross-validation. If specified it
has to be an integer smaller than the size of |
seed |
Super seed for setting training data seeds when randomly splitting (bootstrapping) the data during cross-validation. |
trainseeds |
Seeds for training models during cross-validation. |
parallel |
The type of parallel operation to be used (if any). If missing, the default is |
ncpus |
integer: number of processes to be used in parallel operation. |
cl |
An optional |
progress.bar |
Style for |
errorhandling |
Argument passed as |
keep |
list of characters (not case sensitive) which determines additional output.
|
predictRisk.args |
A list of argument-lists to control how risks are predicted.
The names of the lists should be the S3-classes of the A more flexible approach is to write a new predictRisk S3-method. See Details. |
debug |
Logical. If |
censoring.save.memory |
Only relevant in censored data where censoring weigths are obtained with
Cox regression and argument |
breaks |
Break points for computing the Roc curve. Defaults to
|
roc.method |
Method for averaging ROC curves across data splits.
If |
roc.grid |
Grid points for the averaging of ROC curves. A sequence of values at which to compute averages across the ROC curves obtained for different data splits during crossvalidation. |
cutpoints |
If not |
Details
The function implements a toolbox for the risk prediction modeller: all tools work for the three outcomes: (1) binary (uncensored), (2) right censored time to event without competing risks, (3) right censored time to event with competing risks
Computed are the (time-dependent) Brier score and the (time-dependent) area under the ROC curve for a list of risk prediction models either in external validation data or in the learning data using bootstrap cross-validation. The function optionally provides results for plotting (time-point specific) ROC curves, for (time-point specific) calibration curves and for (time-point specific) retrospective boxplots.
For uncensored binary outcome the Delong-Delong test is used to contrast AUC of rival models. In right censored survival data (with and without competing risks) the p-values correspond to Wald tests based on standard errors obtained with an estimate of the influence function as described in detail in the appendix of Blanche et al. (2015).
This function works with one or multiple models that predict the risk of an event R(t|X) for a subject characterized by predictors X at time t. With binary endpoints (outcome 0/1 without time component) the risk is simply R(X). In case of a survival object without competing risks the function still works with predicted event probabilities, i.e., R(t|X)=1-S(t|X) where S(t|X) is the predicted survival chance for subject X at time t.
The already existing predictRisk methods (see methods(predictRisk)) may not cover all models and methods
for predicting risks. But users can quickly extend the package as explained in detail in Mogensen et al. (2012) for
the predecessors pec::predictSurvProb
and pec::predictEventProb
which have been unified as
riskRegression::predictRisk
.
Bootstrap Crossvalidation (see also Gerds & Schumacher 2007 and Mogensen et al. 2012)
B=10, M (not specified or M=NROW(data))
Training of each of the models in each of 10 bootstrap data sets (learning data sets).
Learning data sets are obtained by sampling NROW(data)
subjects of the data set
with replacement. There are roughly .632*NROW(data)
subjects in the learning data (inbag)
and .368*NROW(data)
subjects not in the validation data sets (out-of-bag).
These are used to estimate the scores: AUC, Brier, etc. Reported are averages across the 10 splits.
## Bootstrap with replacement
set.seed(13)
N=17
data = data.frame(id=1:N, y=rbinom(N,1,.3),x=rnorm(N))
boot.index = sample(1:N,size=N,replace=TRUE)
boot.index
inbag = 1:N
outofbag = !inbag
learn.data = data[inbag]
val.data = data[outofbag]
riskRegression:::getSplitMethod("bootcv",B=10,N=17)$index
NOTE: the number .632 is the expected probability to draw one subject (for example subject 1) with
replacement from the data, which does not depend on the sample size:
B=10000
N=137
mean(sapply(1:B, function(b){match(1,sample(1:N,size=N,replace=TRUE),nomatch=0)}))
N=30
mean(sapply(1:B, function(b){match(1,sample(1:N,size=N,replace=TRUE),nomatch=0)}))
N=300
mean(sapply(1:B, function(b){match(1,sample(1:N,size=N,replace=TRUE),nomatch=0)}))
## Bootstrap without replacement (training size set to be 70 percent of data) B=10, M=.7
Training of each of the models in each of 10 bootstrap data sets (learning data sets).
Learning data sets are obtained by sampling round(.8*NROW(data))
subjects of the data set
without replacement. There are NROW(data)-round(.8*NROW(data))
subjects not in the learning data sets.
These are used to estimate the scores: AUC, Brier, etc. Reported are averages across the 10 splits.
set.seed(13)
N=17
data = data.frame(id=1:N, y=rbinom(N,1,.3),x=rnorm(N))
boot.index = sample(1:N,size=M,replace=FALSE)
boot.index
inbag = 1:N
outofbag = !inbag
learn.data = data[inbag]
val.data = data[outofbag]
riskRegression:::getSplitMethod("bootcv",B=10,N=17,M=.7)$index
Value
List with scores and assessments of contrasts, i.e.,
tests and confidence limits for performance and difference in performance (AUC and Brier),
summaries and plots. Most elements are indata.table
format.
Author(s)
Thomas A Gerds tag@biostat.ku.dk and Paul Blanche paul.blanche@univ-ubs.fr
References
Thomas A. Gerds and Michael W. Kattan (2021). Medical Risk Prediction Models: With Ties to Machine Learning (1st ed.) Chapman and Hall/CRC https://doi.org/10.1201/9781138384484
Ulla B. Mogensen, Hemant Ishwaran, Thomas A. Gerds (2012). Evaluating Random Forests for Survival Analysis Using Prediction Error Curves. Journal of Statistical Software, 50(11), 1-23. URL http://www.jstatsoft.org/v50/i11/.
Paul Blanche, Cecile Proust-Lima, Lucie Loubere, Claudine Berr, Jean- Francois Dartigues, and Helene Jacqmin-Gadda. Quantifying and comparing dynamic predictive accuracy of joint models for longitudinal marker and time-to-event in presence of censoring and competing risks. Biometrics, 71 (1):102–113, 2015.
P. Blanche, J-F Dartigues, and H. Jacqmin-Gadda. Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Statistics in Medicine, 32(30):5381–5397, 2013.
E. Graf et al. (1999), Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine, vol 18, pp= 2529–2545.
Efron, Tibshirani (1997) Journal of the American Statistical Association 92, 548–560 Improvement On Cross-Validation: The .632+ Bootstrap Method.
Gerds, Schumacher (2006), Consistent estimation of the expected Brier score in general survival models with right-censored event times. Biometrical Journal, vol 48, 1029–1040.
Thomas A. Gerds, Martin Schumacher (2007) Efron-Type Measures of Prediction Error for Survival Analysis Biometrics, 63(4), 1283–1287 doi:10.1111/j.1541-0420.2007.00832.x
Martin Schumacher, Harald Binder, and Thomas Gerds. Assessment of survival prediction models based on microarray data. Bioinformatics, 23(14):1768-74, 2007.
Mark A. van de Wiel, Johannes Berkhof, and Wessel N. van Wieringen Testing the prediction error difference between 2 predictors Biostatistics (2009) 10(3): 550-560 doi:10.1093/biostatistics/kxp011
Michael W Kattan and Thomas A Gerds. The index of prediction accuracy: an intuitive measure useful for evaluating risk prediction models. Diagnostic and Prognostic Research, 2(1):7, 2018.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861-874.
Examples
# binary outcome
library(lava)
set.seed(18)
learndat <- sampleData(48,outcome="binary")
testdat <- sampleData(40,outcome="binary")
## score logistic regression models
lr1 = glm(Y~X1+X2+X7+X9,data=learndat,family=binomial)
lr2 = glm(Y~X3+X5,data=learndat,family=binomial)
Score(list("LR(X1+X2+X7+X9)"=lr1,"LR(X3+X5)"=lr2),formula=Y~1,data=testdat)
## ROC curve and calibration plot
xb=Score(list("LR(X1+X2+X7+X9)"=lr1,"LR(X3+X5+X6)"=lr2),formula=Y~1,
data=testdat,plots=c("calibration","ROC"))
## Not run: plotROC(xb)
plotCalibration(xb)
## End(Not run)
## compute AUC for a list of continuous markers
markers = as.list(testdat[,.(X6,X7,X8,X9,X10)])
Score(markers,formula=Y~1,data=testdat,metrics=c("auc"))
# cross-validation
## Not run:
set.seed(10)
learndat=sampleData(400,outcome="binary")
lr1a = glm(Y~X6,data=learndat,family=binomial)
lr2a = glm(Y~X7+X8+X9,data=learndat,family=binomial)
## bootstrap cross-validation
x1=Score(list("LR1"=lr1a,"LR2"=lr2a),formula=Y~1,data=learndat,split.method="bootcv",B=100)
x1
## leave-one-out and leave-pair-out bootstrap
x2=Score(list("LR1"=lr1a,"LR2"=lr2a),formula=Y~1,data=learndat,
split.method="loob",
B=100,plots="calibration")
x2
## 5-fold cross-validation
x3=Score(list("LR1"=lr1a,"LR2"=lr2a),formula=Y~1,data=learndat,
split.method="cv5",
B=1,plots="calibration")
x3
## End(Not run)
# survival outcome
# Score Cox regression models
## Not run: library(survival)
library(rms)
library(prodlim)
set.seed(18)
trainSurv <- sampleData(100,outcome="survival")
testSurv <- sampleData(40,outcome="survival")
cox1 = coxph(Surv(time,event)~X1+X2+X7+X9,data=trainSurv, y=TRUE, x = TRUE)
cox2 = coxph(Surv(time,event)~X3+X5+X6,data=trainSurv, y=TRUE, x = TRUE)
x=Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
formula=Surv(time,event)~1,data=testSurv,conf.int=FALSE,times=c(5,8))
## Use Cox to estimate censoring weights
y=Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
formula=Surv(time,event)~X1+X8,data=testSurv,conf.int=FALSE,times=c(5,8))
## Use GLMnet to estimate censoring weights
z=Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
formula=Surv(time,event)~X1+X8,cens.model = "GLMnet",data=testSurv,
conf.int=FALSE,times=c(5,8))
## Use hal9001 to estimate censoring weights
w=Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
formula=Surv(time,event)~X1+X8,cens.model = "Hal9001",data=testSurv,
conf.int=FALSE,times=c(5,8))
x
y
z
w
## End(Not run)
## Not run: library(survival)
library(rms)
library(prodlim)
set.seed(18)
trainSurv <- sampleData(100,outcome="survival")
testSurv <- sampleData(40,outcome="survival")
cox1 = coxph(Surv(time,event)~X1+X2+X7+X9,data=trainSurv, y=TRUE, x = TRUE)
cox2 = coxph(Surv(time,event)~X3+X5+X6,data=trainSurv, y=TRUE, x = TRUE)
xs=Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
formula=Surv(time,event)~1,data=testSurv,conf.int=FALSE,times=c(5,8))
xs
## End(Not run)
# Integrated Brier score
## Not run:
xs=Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
formula=Surv(time,event)~1,data=testSurv,conf.int=FALSE,
summary="ibs",
times=sort(unique(testSurv$time)))
## End(Not run)
# time-dependent AUC for list of markers
## Not run: survmarkers = as.list(testSurv[,.(X6,X7,X8,X9,X10)])
Score(survmarkers,
formula=Surv(time,event)~1,metrics="auc",data=testSurv,
conf.int=TRUE,times=c(5,8))
# compare models on test data
Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
formula=Surv(time,event)~1,data=testSurv,conf.int=TRUE,times=c(5,8))
## End(Not run)
# crossvalidation models in traindata
## Not run:
library(survival)
set.seed(18)
trainSurv <- sampleData(400,outcome="survival")
cox1 = coxph(Surv(time,event)~X1+X2+X7+X9,data=trainSurv, y=TRUE, x = TRUE)
cox2 = coxph(Surv(time,event)~X3+X5+X6,data=trainSurv, y=TRUE, x = TRUE)
x1 = Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
formula=Surv(time,event)~1,data=trainSurv,conf.int=TRUE,times=c(5,8),
split.method="loob",B=100,plots="calibration")
x2= Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
formula=Surv(time,event)~1,data=trainSurv,conf.int=TRUE,times=c(5,8),
split.method="bootcv",B=100)
## End(Not run)
# restrict number of comparisons
## Not run:
Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
formula=Surv(time,event)~1,data=trainSurv,contrasts=TRUE,
null.model=FALSE,conf.int=TRUE,times=c(5,8),split.method="bootcv",B=3)
# competing risks outcome
set.seed(18)
trainCR <- sampleData(400,outcome="competing.risks")
testCR <- sampleData(400,outcome="competing.risks")
library(riskRegression)
library(cmprsk)
# Cause-specific Cox regression
csc1 = CSC(Hist(time,event)~X1+X2+X7+X9,data=trainCR)
csc2 = CSC(Hist(time,event)~X3+X5+X6,data=trainCR)
# Fine-Gray regression
fgr1 = FGR(Hist(time,event)~X1+X2+X7+X9,data=trainCR,cause=1)
fgr2 = FGR(Hist(time,event)~X3+X5+X6,data=trainCR,cause=1)
Score(list("CSC(X1+X2+X7+X9)"=csc1,"CSC(X3+X5+X6)"=csc2,
"FGR(X1+X2+X7+X9)"=fgr1,"FGR(X3+X5+X6)"=fgr2),
formula=Hist(time,event)~1,data=testCR,se.fit=1L,times=c(5,8))
## End(Not run)
## Not run:
# reproduce some results of Table IV of Blanche et al. Stat Med 2013
data(Paquid)
ResPaquid <- Score(list("DSST"=-Paquid$DSST,"MMSE"=-Paquid$MMSE),
formula=Hist(time,status)~1,
data=Paquid,
null.model = FALSE,
conf.int=TRUE,
metrics=c("auc"),
times=c(3,5,10),
plots="ROC")
ResPaquid
plotROC(ResPaquid,time=5)
## End(Not run)
## Not run:
# parallel options
# by erikvona: Here is a generic example of using future
# and doFuture, works great with the current version:
library(riskRegression)
library(future)
library(foreach)
library(doFuture)
library(survival)
# Register all available cores for parallel operation
plan(multiprocess, workers = availableCores())
registerDoFuture()
set.seed(10)
trainSurv <- sampleData(400,outcome="survival")
cox1 = coxph(Surv(time,event)~X1+X2+X7+X9,data=trainSurv,
y=TRUE, x = TRUE)
# Bootstrapping on multiple cores
x1 = Score(list("Cox(X1+X2+X7+X9)"=cox1),
formula=Surv(time,event)~1,data=trainSurv, times=c(5,8),
parallel = "as.registered", split.method="bootcv",B=100)
## End(Not run)