R: Score risk predictions

Score {riskRegression}

R Documentation

Score risk predictions

Description

Methods to score the predictive performance of risk markers and risk prediction models

Usage

Score(object, ...)

## S3 method for class 'list'
Score(
  object,
  formula,
  data,
  metrics = c("auc", "brier"),
  summary = NULL,
  plots = NULL,
  cause,
  times,
  landmarks,
  use.event.times = FALSE,
  null.model = TRUE,
  se.fit = TRUE,
  conservative = FALSE,
  multi.split.test = FALSE,
  conf.int = 0.95,
  contrasts = TRUE,
  probs = c(0, 0.25, 0.5, 0.75, 1),
  cens.method = "ipcw",
  cens.model = "cox",
  split.method,
  B,
  M,
  seed,
  trainseeds,
  parallel = c("no", "multicore", "snow", "as.registered"),
  ncpus = 1,
  cl = NULL,
  progress.bar = 3,
  errorhandling = "pass",
  keep,
  predictRisk.args,
  debug = 0L,
  censoring.save.memory = FALSE,
  breaks = seq(0, 1, 0.01),
  roc.method = "vertical",
  roc.grid = switch(roc.method, vertical = seq(0, 1, 0.01), horizontal = seq(1, 0,
    -0.01)),
  cutpoints = NULL,
  ...
)

Arguments

`object`	List of risk predictions (see details and examples).
`...`	Named list containing additional arguments that are passed on to the `predictRisk` methods corresponding to object. See examples.
`formula`	A formula which identifies the outcome (left hand side). E.g., `Y ~ 1` for binary and `Hist(time,status) ~ 1` for time-to-event outcome. In right censored data, the right hand side of the formula is used to estimate the inverse probability of censoring weights (IPCW) model.
`data`	`data.frame` or `data.table` in which the formula can be interpreted.
`metrics`	Character vector specifying which metrics to apply. Case does not matter. Choices are `"AUC"` and `"Brier"`.
`summary`	Character vector specifying which summary statistics to apply to the predicted risks. Choices are `"risks"`, `"IPA"`, `"riskQuantile"` and `"ibs"`. Can be all `c("risks","IPA","riskQuantile","ibs")` or a subset thereof. `"risks"` adds the predicted risks to the output. `"ipa"` computes the index of prediction accuracy (AKA R-squared) based on Brier scores for model vs null model `"riskQuantile"` calculates time-point specific boxplots for the predicted risks (or biomarker values) conditional on the outcome at the time-point. `"ibs"` calculates integrated Brier scores across the time points at which the Brier score is computed. This works only with time-to-event outcome and the results depend on the argument `times`. Set to `NULL` to avoid estimation of summary statistics.
`plots`	Character vector specifying for which plots to put data into the result. Currently implemented are `"ROC"`, `"Calibration"` and `"boxplot"`. In addition, one can plot AUC and Brier score as function of time as soon as `times` has at least two different values.
`cause`	Event of interest. Used for binary outcome `Y` to specify that risks are risks of the event `Y=event` and for competing risks outcome to specify the cause of interest.
`times`	For survival and competing risks outcome: list of prediction horizons. All times which are greater than the maximal observed time in the data set are automatically removed. Note that the object returned by the function may become huge when the prediction performance is estimated at many prediction horizons.
`landmarks`	Not yet implemented.
`use.event.times`	If `TRUE` merge all unique event times with the vector given by argument `times`.
`null.model`	If `TRUE` fit a risk prediction model which ignores the covariates and predicts the same value for all subjects. The model is fitted using `data` and the left hand side of `formula`. For binary outcome this is just the empirical prevalence. For (right censored) time to event outcome, the null models are equal to the Kaplan-Meier estimator (no competing risks) and the Aalen-Johansen estimator (with competing risks).
`se.fit`	Logical or `0` or `1`. If `FALSE` or `0` do not calculate standard errors.
`conservative`	Logical, only relevant in right censored data. If `TRUE` ignore variability of the estimate of the inverse probability of censoring weights when calculating standard errors for prediction performance parameters. This can potentially reduce computation time and memory usage at a usually very small expense of a slightly higher standard error.
`multi.split.test`	Logical or `0` or `1`. If `FALSE` or `0` do not calculate multi-split tests. This argument is ignored when `split.method` is "none".
`conf.int`	Either logical or a numeric value between 0 and 1. In right censored data, confidence intervals are based on Blanche et al (see references). Setting `FALSE` prevents the computation of confidence intervals. `TRUE` computes 95 percent confidence intervals and corresponding p-values for AUC and Brier score. If set to 0.87, the level of significance is 13 percent. So, do not set it to 0.87.
`contrasts`	Either logical or a list of contrasts. A list of contrasts defines which risk prediction models (markers) should be contrasted with respect to their prediction performance. If `TRUE` do all possible comparisons. For example, when `object` is a list with two risk prediction models and `null.model=TRUE` setting `TRUE` is equivalent to `list(c(0,1,2),c(1,2))` where `c(0,1,2)` codes for the two comparisons: 1 vs 0 and 2 vs 0 (positive integers refer to elements of `object`, 0 refers to the benchmark null model which ignores the covariates). This again is equivalent to explicitly setting `list(c(0,1),c(0,2),c(1,2))`. A more complex example: Suppose `object` has 7 elements and you want to do the following 3 comparisons: 6 vs 3, 2 vs 5 and 2 vs 3, you should set `contrasts=c(6,3),c(2,5,3)`.
`probs`	Quantiles for retrospective summary statistics of the predicted risks. This affects the result of the function `boxplot.Score`.
`cens.method`	Method for dealing with right censored data. Either `"ipcw"` or `"pseudo"`. Here IPCW refers to inverse probability of censoring weights and `pseudo` for jackknife pseudo values. Right now pseudo values are only used for calibration curves.
`cens.model`	Model for estimating inverse probability of censored weights (IPCW). Implemented are the Kaplan-Meier method (`"km"`) and Cox regression (`"cox"`) both applied to the censored times. If the right hand side of `formula` does not specify covariates, the Kaplan-Meier method is used even if this argument is set to `"cox"`. Also implemented is a template for users specifying other models to estimate the IPCW. Here the user should be supply a function, taking as input a `"formula"` and `"data"`. This does come at the cost of only being able to calculate conservative confidence intervals.
`split.method`	Method for cross-validation. Right now the only choices are `bootcv`, `cvk` and `loob`. In the first case, bootstrap learning sets are drawn with our without replacement (argument `M`) from `data`. The data not included in the current bootstrap learning set are used as validation set to compute the prediction performance. In the second case, k-fold cross-validation is performed. Note that k has to be an explicit number, e.g. 5 or 10, when passing this as an argument. In the third case, leave-one-out bootstrap cross-validation is performed for the Brier score and leave-pair-out bootstrap cross-validation is performed for the AUC.
`B`	Number of bootstrap sets for cross-validation. `B` should be set to 1, when k-fold cross-validation is used.
`M`	Size of subsamples for bootstrap cross-validation. If specified it has to be an integer smaller than the size of `data`.
`seed`	Super seed for setting training data seeds when randomly splitting (bootstrapping) the data during cross-validation.
`trainseeds`	Seeds for training models during cross-validation.
`parallel`	The type of parallel operation to be used (if any). If missing, the default is `"no"`.
`ncpus`	integer: number of processes to be used in parallel operation.
`cl`	An optional `parallel` or `snow` cluster for use if `parallel = "snow"`. If not supplied, a cluster on the local machine is created for the duration of the `Score` call.
`progress.bar`	Style for `txtProgressBar`. Can be 1,2,3 see `help(txtProgressBar)` or NULL to avoid the progress bar.
`errorhandling`	Argument passed as `.errorhandling` to foreach. Default is `"pass"`.
`keep`	list of characters (not case sensitive) which determines additional output. `"residuals"` provides Brier score residuals and `"splitindex"` provides sampling index used to split the data into training and validation sets. It is a function, whose argument is the bootstrap sample, which one wishes to look at. `"vcov"` provides the variance-covariance matrix for the estimates. `"iid"` provides the estimated influence function of the estimates.
`predictRisk.args`	A list of argument-lists to control how risks are predicted. The names of the lists should be the S3-classes of the `object`. The argument-lists are then passed on to the S3-class specific predictRisk method. For example, if your object contains one or several random forest model fitted with the function randomForestSRC::rfsrc then you can specify additional arguments for the function riskRegression::predictRisk.rfsrc which will pass these on to the function randomForestSRC::predict.rfsrc. A specific example in this case would be `list(rfsrc=list(na.action="na.impute"))`. A more flexible approach is to write a new predictRisk S3-method. See Details.
`debug`	Logical. If `TRUE` indicate landmarks in progress of the program.
`censoring.save.memory`	Only relevant in censored data where censoring weigths are obtained with Cox regression and argument `conservative` is set to `FALSE`. If `TRUE`, save memory by not storing the influence function of the cumulative hazard of the censoring as a matrix when calculating standard errors with Cox censoring. This can allow one to use `Score` on larger test data sets, but may be slower.
`breaks`	Break points for computing the Roc curve. Defaults to `seq(0,1,.01)` when some form of crossvalidation is applied, otherwise to all unique values of the predictive marker.
`roc.method`	Method for averaging ROC curves across data splits. If `'horizontal'` average crossvalidated specificities for fixed sensitivity values, specified in `roc.grid`, otherwise, if `'vertical'`, average crossvalidated specificities for fixed sensitivity values. See Fawcett, T. (2006) for details.
`roc.grid`	Grid points for the averaging of ROC curves. A sequence of values at which to compute averages across the ROC curves obtained for different data splits during crossvalidation.
`cutpoints`	If not `NULL`, estimates and standard errors of the TPR (True Positive Rate), FPR (False Positive Rate), PPV (Positive Predictive Value), and NPV (Negative Predictive Value) are given at the `cutpoints`. These values are saved in `object$AUC$res.cut`.

Details

The function implements a toolbox for the risk prediction modeller: all tools work for the three outcomes: (1) binary (uncensored), (2) right censored time to event without competing risks, (3) right censored time to event with competing risks

Computed are the (time-dependent) Brier score and the (time-dependent) area under the ROC curve for a list of risk prediction models either in external validation data or in the learning data using bootstrap cross-validation. The function optionally provides results for plotting (time-point specific) ROC curves, for (time-point specific) calibration curves and for (time-point specific) retrospective boxplots.

For uncensored binary outcome the Delong-Delong test is used to contrast AUC of rival models. In right censored survival data (with and without competing risks) the p-values correspond to Wald tests based on standard errors obtained with an estimate of the influence function as described in detail in the appendix of Blanche et al. (2015).

This function works with one or multiple models that predict the risk of an event R(t|X) for a subject characterized by predictors X at time t. With binary endpoints (outcome 0/1 without time component) the risk is simply R(X). In case of a survival object without competing risks the function still works with predicted event probabilities, i.e., R(t|X)=1-S(t|X) where S(t|X) is the predicted survival chance for subject X at time t.

The already existing predictRisk methods (see methods(predictRisk)) may not cover all models and methods for predicting risks. But users can quickly extend the package as explained in detail in Mogensen et al. (2012) for the predecessors pec::predictSurvProb and pec::predictEventProb which have been unified as riskRegression::predictRisk.

Bootstrap Crossvalidation (see also Gerds & Schumacher 2007 and Mogensen et al. 2012)

B=10, M (not specified or M=NROW(data)) Training of each of the models in each of 10 bootstrap data sets (learning data sets). Learning data sets are obtained by sampling NROW(data) subjects of the data set with replacement. There are roughly .632*NROW(data) subjects in the learning data (inbag) and .368*NROW(data) subjects not in the validation data sets (out-of-bag).

These are used to estimate the scores: AUC, Brier, etc. Reported are averages across the 10 splits.

## Bootstrap with replacement set.seed(13) N=17 data = data.frame(id=1:N, y=rbinom(N,1,.3),x=rnorm(N)) boot.index = sample(1:N,size=N,replace=TRUE) boot.index inbag = 1:N outofbag = !inbag learn.data = data[inbag] val.data = data[outofbag] riskRegression:::getSplitMethod("bootcv",B=10,N=17)$index NOTE: the number .632 is the expected probability to draw one subject (for example subject 1) with replacement from the data, which does not depend on the sample size: B=10000 N=137 mean(sapply(1:B, function(b){match(1,sample(1:N,size=N,replace=TRUE),nomatch=0)})) N=30 mean(sapply(1:B, function(b){match(1,sample(1:N,size=N,replace=TRUE),nomatch=0)})) N=300 mean(sapply(1:B, function(b){match(1,sample(1:N,size=N,replace=TRUE),nomatch=0)}))

## Bootstrap without replacement (training size set to be 70 percent of data) B=10, M=.7

Training of each of the models in each of 10 bootstrap data sets (learning data sets). Learning data sets are obtained by sampling round(.8*NROW(data)) subjects of the data set without replacement. There are NROW(data)-round(.8*NROW(data)) subjects not in the learning data sets. These are used to estimate the scores: AUC, Brier, etc. Reported are averages across the 10 splits. set.seed(13) N=17 data = data.frame(id=1:N, y=rbinom(N,1,.3),x=rnorm(N)) boot.index = sample(1:N,size=M,replace=FALSE) boot.index inbag = 1:N outofbag = !inbag learn.data = data[inbag] val.data = data[outofbag] riskRegression:::getSplitMethod("bootcv",B=10,N=17,M=.7)$index

Value

List with scores and assessments of contrasts, i.e., tests and confidence limits for performance and difference in performance (AUC and Brier), summaries and plots. Most elements are indata.table format.

Author(s)

Thomas A Gerds tag@biostat.ku.dk and Paul Blanche paul.blanche@univ-ubs.fr

References

Thomas A. Gerds and Michael W. Kattan (2021). Medical Risk Prediction Models: With Ties to Machine Learning (1st ed.) Chapman and Hall/CRC https://doi.org/10.1201/9781138384484

Ulla B. Mogensen, Hemant Ishwaran, Thomas A. Gerds (2012). Evaluating Random Forests for Survival Analysis Using Prediction Error Curves. Journal of Statistical Software, 50(11), 1-23. URL http://www.jstatsoft.org/v50/i11/.

Paul Blanche, Cecile Proust-Lima, Lucie Loubere, Claudine Berr, Jean- Francois Dartigues, and Helene Jacqmin-Gadda. Quantifying and comparing dynamic predictive accuracy of joint models for longitudinal marker and time-to-event in presence of censoring and competing risks. Biometrics, 71 (1):102–113, 2015.

P. Blanche, J-F Dartigues, and H. Jacqmin-Gadda. Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Statistics in Medicine, 32(30):5381–5397, 2013.

E. Graf et al. (1999), Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine, vol 18, pp= 2529–2545.

Efron, Tibshirani (1997) Journal of the American Statistical Association 92, 548–560 Improvement On Cross-Validation: The .632+ Bootstrap Method.

Gerds, Schumacher (2006), Consistent estimation of the expected Brier score in general survival models with right-censored event times. Biometrical Journal, vol 48, 1029–1040.

Thomas A. Gerds, Martin Schumacher (2007) Efron-Type Measures of Prediction Error for Survival Analysis Biometrics, 63(4), 1283–1287 doi:10.1111/j.1541-0420.2007.00832.x

Martin Schumacher, Harald Binder, and Thomas Gerds. Assessment of survival prediction models based on microarray data. Bioinformatics, 23(14):1768-74, 2007.

Mark A. van de Wiel, Johannes Berkhof, and Wessel N. van Wieringen Testing the prediction error difference between 2 predictors Biostatistics (2009) 10(3): 550-560 doi:10.1093/biostatistics/kxp011

Michael W Kattan and Thomas A Gerds. The index of prediction accuracy: an intuitive measure useful for evaluating risk prediction models. Diagnostic and Prognostic Research, 2(1):7, 2018.

Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861-874.

Examples

# binary outcome
library(lava)
set.seed(18)
learndat <- sampleData(48,outcome="binary")
testdat <- sampleData(40,outcome="binary")

## score logistic regression models
lr1 = glm(Y~X1+X2+X7+X9,data=learndat,family=binomial)
lr2 = glm(Y~X3+X5,data=learndat,family=binomial)
Score(list("LR(X1+X2+X7+X9)"=lr1,"LR(X3+X5)"=lr2),formula=Y~1,data=testdat)

## ROC curve and calibration plot
xb=Score(list("LR(X1+X2+X7+X9)"=lr1,"LR(X3+X5+X6)"=lr2),formula=Y~1,
         data=testdat,plots=c("calibration","ROC"))
## Not run: plotROC(xb)
plotCalibration(xb)

## End(Not run)

## compute AUC for a list of continuous markers
markers = as.list(testdat[,.(X6,X7,X8,X9,X10)])
Score(markers,formula=Y~1,data=testdat,metrics=c("auc"))

# cross-validation
## Not run: 
    set.seed(10)
    learndat=sampleData(400,outcome="binary")
    lr1a = glm(Y~X6,data=learndat,family=binomial)
    lr2a = glm(Y~X7+X8+X9,data=learndat,family=binomial)
    ## bootstrap cross-validation
    x1=Score(list("LR1"=lr1a,"LR2"=lr2a),formula=Y~1,data=learndat,split.method="bootcv",B=100)
    x1
    ## leave-one-out and leave-pair-out bootstrap
    x2=Score(list("LR1"=lr1a,"LR2"=lr2a),formula=Y~1,data=learndat,
             split.method="loob",
             B=100,plots="calibration")
    x2
    ## 5-fold cross-validation
    x3=Score(list("LR1"=lr1a,"LR2"=lr2a),formula=Y~1,data=learndat,
             split.method="cv5",
             B=1,plots="calibration")
    x3

## End(Not run)
# survival outcome

# Score Cox regression models
## Not run: library(survival)
library(rms)
library(prodlim)
set.seed(18)
trainSurv <- sampleData(100,outcome="survival")
testSurv <- sampleData(40,outcome="survival")
cox1 = coxph(Surv(time,event)~X1+X2+X7+X9,data=trainSurv, y=TRUE, x = TRUE)
cox2 = coxph(Surv(time,event)~X3+X5+X6,data=trainSurv, y=TRUE, x = TRUE)
x=Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
         formula=Surv(time,event)~1,data=testSurv,conf.int=FALSE,times=c(5,8))
## Use Cox to estimate censoring weights
y=Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
         formula=Surv(time,event)~X1+X8,data=testSurv,conf.int=FALSE,times=c(5,8)) 
## Use GLMnet to estimate censoring weights
z=Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
         formula=Surv(time,event)~X1+X8,cens.model = "GLMnet",data=testSurv,
         conf.int=FALSE,times=c(5,8)) 
## Use hal9001 to estimate censoring weights
w=Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
         formula=Surv(time,event)~X1+X8,cens.model = "Hal9001",data=testSurv,
         conf.int=FALSE,times=c(5,8)) 
x
y
z
w

## End(Not run)

## Not run: library(survival)
library(rms)
library(prodlim)
set.seed(18)
trainSurv <- sampleData(100,outcome="survival")
testSurv <- sampleData(40,outcome="survival")
cox1 = coxph(Surv(time,event)~X1+X2+X7+X9,data=trainSurv, y=TRUE, x = TRUE)
cox2 = coxph(Surv(time,event)~X3+X5+X6,data=trainSurv, y=TRUE, x = TRUE)
xs=Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
         formula=Surv(time,event)~1,data=testSurv,conf.int=FALSE,times=c(5,8))
xs

## End(Not run)
# Integrated Brier score
## Not run: 
xs=Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
         formula=Surv(time,event)~1,data=testSurv,conf.int=FALSE,
         summary="ibs",
         times=sort(unique(testSurv$time)))

## End(Not run)

# time-dependent AUC for list of markers
## Not run: survmarkers = as.list(testSurv[,.(X6,X7,X8,X9,X10)])
Score(survmarkers,
      formula=Surv(time,event)~1,metrics="auc",data=testSurv,
      conf.int=TRUE,times=c(5,8))

# compare models on test data
Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
      formula=Surv(time,event)~1,data=testSurv,conf.int=TRUE,times=c(5,8))

## End(Not run)
# crossvalidation models in traindata
## Not run: 
    library(survival)
    set.seed(18)
    trainSurv <- sampleData(400,outcome="survival")
    cox1 = coxph(Surv(time,event)~X1+X2+X7+X9,data=trainSurv, y=TRUE, x = TRUE)
    cox2 = coxph(Surv(time,event)~X3+X5+X6,data=trainSurv, y=TRUE, x = TRUE)
    x1 = Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
               formula=Surv(time,event)~1,data=trainSurv,conf.int=TRUE,times=c(5,8),
               split.method="loob",B=100,plots="calibration")

    x2= Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
              formula=Surv(time,event)~1,data=trainSurv,conf.int=TRUE,times=c(5,8),
              split.method="bootcv",B=100)

## End(Not run)

# restrict number of comparisons
## Not run: 
    Score(list("Cox(X1+X2+X7+X9)"=cox1,"Cox(X3+X5+X6)"=cox2),
          formula=Surv(time,event)~1,data=trainSurv,contrasts=TRUE,
          null.model=FALSE,conf.int=TRUE,times=c(5,8),split.method="bootcv",B=3)

    # competing risks outcome
    set.seed(18)
    trainCR <- sampleData(400,outcome="competing.risks")
    testCR <- sampleData(400,outcome="competing.risks")
    library(riskRegression)
    library(cmprsk)
    # Cause-specific Cox regression
    csc1 = CSC(Hist(time,event)~X1+X2+X7+X9,data=trainCR)
    csc2 = CSC(Hist(time,event)~X3+X5+X6,data=trainCR)
    # Fine-Gray regression
    fgr1 = FGR(Hist(time,event)~X1+X2+X7+X9,data=trainCR,cause=1)
    fgr2 = FGR(Hist(time,event)~X3+X5+X6,data=trainCR,cause=1)
    Score(list("CSC(X1+X2+X7+X9)"=csc1,"CSC(X3+X5+X6)"=csc2,
               "FGR(X1+X2+X7+X9)"=fgr1,"FGR(X3+X5+X6)"=fgr2),
          formula=Hist(time,event)~1,data=testCR,se.fit=1L,times=c(5,8))

## End(Not run)



## Not run: 
    # reproduce some results of Table IV of Blanche et al. Stat Med 2013
    data(Paquid)
    ResPaquid <- Score(list("DSST"=-Paquid$DSST,"MMSE"=-Paquid$MMSE),
                       formula=Hist(time,status)~1,
                       data=Paquid,
                       null.model = FALSE,
                       conf.int=TRUE,
                       metrics=c("auc"),
                       times=c(3,5,10),
                       plots="ROC")
    ResPaquid
    plotROC(ResPaquid,time=5)

## End(Not run)
## Not run: 
# parallel options
# by erikvona: Here is a generic example of using future
# and doFuture, works great with the current version:
library(riskRegression)
library(future)
library(foreach)
library(doFuture)
library(survival)
# Register all available cores for parallel operation
plan(multiprocess, workers = availableCores())
registerDoFuture()
set.seed(10)
trainSurv <- sampleData(400,outcome="survival")
cox1 = coxph(Surv(time,event)~X1+X2+X7+X9,data=trainSurv,
             y=TRUE, x = TRUE)
# Bootstrapping on multiple cores
x1 = Score(list("Cox(X1+X2+X7+X9)"=cox1),
     formula=Surv(time,event)~1,data=trainSurv, times=c(5,8),
     parallel = "as.registered", split.method="bootcv",B=100)

## End(Not run)

[Package riskRegression version 2023.12.21 Index]