R: Evaluation of the prediction accuracy of a prediction model

predRes {biospear}

R Documentation

Evaluation of the prediction accuracy of a prediction model

Description

This function computes several criteria to assess the prediction accuracy of a prediction model.

Usage

predRes(res, method, traindata, newdata, int.cv, int.cv.nfold = 5, time,
  trace = TRUE, ncores = 1)

## S3 method for class 'predRes'
plot(x, method, crit = c("C", "PE", "dC"),
  xlim, ylim, xlab, ylab, col,...)

Arguments

`res`	an object of class '`resBMsel`' generated by `BMsel`.
`method`	methods for which prediction criteria are computed. If missing, all methods contained in `res` are computed.
`traindata`	input `data.frame` used to compute the `res` object. This object is mandatory.
`newdata`	input `data.frame` not used to compute the `res` object. This object is not mandatory (see Details section).
`int.cv`	logical parameter indicating if a double cross-validation process (2CV) should be performed to mimick an external validation set.
`int.cv.nfold`	number of folds for the double cross-validation. Considering a large value for `int.cv.nfold` should provide extremely large computation time. `int.cv.nfold` must not be considered when `int.cv = FALSE`.
`time`	time points to compute the prediction criteria.
`trace`	logical parameter indicating if messages should be printed.
`ncores`	number of CPUs used (for the double cross-validation).
`x`	an object of class '`predRes`' generated from `predRes`.
`crit`	parameter indicating the criterion for which the results will be printed (`C`: concordance via Uno's C-statistic, `PE`: prediction error via integrated Brier score and `dC`: delta Uno's C-statistic (for the interaction setting only)).
`xlim`, `ylim`, `xlab`, `ylab`, `col`	usual parameters for plot.
`...`	other paramaters for plot.

Details

To evaluate the accuracy of the selected models, three predictive accuracy measures are implemented:
- the integrated Brier score (PE) to measure the overall prediction error of the prediction model. The time-dependent Brier score is a quadratic score based on the predicted time-dependent survival probability.
- the Uno's C-statistic (C) to evaluate the discrimination of the prediction model. It's one of the least biased concordance statistic estimator in the presence of censoring (Uno et al., 2011).
- the absolute difference of the treatment-specific Uno's C-statistics (dC) to evaluate the interaction strength of the prediction model (Ternes et al., 2016).
For simulated datasets, the predictive accuracy metrics are also computed for the "oracle model" that is the unpenalized Cox proportional hazards model fitted to the active biomarkers only.

Value

A list of the same length of the time considered. Each element of the list contains between 1 and 3 sublists depending on the chosen validation (i.e. training set [always computed], internal validation through double cross-validation (2CV) [if int.cv = TRUE] and/or external validation [if newdata is provided]). Each sublist is a matrix containing the predictive accuracy metrics of the implemented methods.

Author(s)

Nils Ternes, Federico Rotolo, and Stefan Michiels
Maintainer: Nils Ternes nils.ternes@yahoo.com

References

Ternes N, Rotolo F and Michiels S. Empirical extensions of the lasso penalty to reduce the false discovery rate in high-dimensional Cox regression models. Statistics in Medicine 2016;35(15):2561-2573. doi:10.1002/sim.6927
Ternes N, Rotolo F, Heinze G and Michiels S. Identification of biomarker-by-treatment interactions in randomized clinical trials with survival outcomes and high-dimensional spaces. Biometrical journal. In press. doi:10.1002/bimj.201500234
Uno H, Cai T, Pencina MJ, DAgostino RB and Wei LJ. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine 2011;30:1105-1117. doi:10.1002/sim.4154

Examples

########################################
# Simulated data set
########################################

## Low calculation time
  set.seed(654321)
  sdata <- simdata(
    n = 500, p = 20, q.main = 3, q.inter = 0,
    prob.tt = 0.5, alpha.tt = 0,
    beta.main = -0.8,
    b.corr = 0.6, b.corr.by = 4,
    m0 = 5, wei.shape = 1, recr = 4, fu = 2,
    timefactor = 1)
  
  newdata <- simdataV(
    traindata = sdata,
    Nvalid = 500
  )
   
  resBM <- BMsel(
    data = sdata, 
    method = c("lasso", "lasso-pcvl"), 
    inter = FALSE, 
    folds = 5)
  
  predAcc <- predRes(
    res = resBM,
    traindata = sdata,
    newdata = newdata,
    time = 1:5)
    
  plot(predAcc, crit = "C")

## Not run: 
## Moderate calculation time
  set.seed(123456)
  sdata <- simdata(
    n = 500, p = 100, q.main = 5, q.inter = 5,
    prob.tt = 0.5, alpha.tt = -0.5,
    beta.main = c(-0.5, -0.2), beta.inter = c(-0.7, -0.4),
    b.corr = 0.6, b.corr.by = 10,
    m0 = 5, wei.shape = 1, recr = 4, fu = 2,
    timefactor = 1,
    active.inter = c("bm003", "bm021", "bm044", "bm049", "bm097"))

  resBM <- BMsel(
    data = sdata, 
    method = c("lasso", "lasso-pcvl"), 
    inter = TRUE, 
    folds = 5)
  
  predAcc <- predRes(
    res = resBM,
    traindata = sdata, 
    int.cv = TRUE, 
    time = 1:5, 
    ncores = 5)
  plot(predAcc, crit = "dC")

## End(Not run)

########################################
# Breast cancer data set
########################################

## Not run: 
  data(Breast)
  dim(Breast)
  
  set.seed(123456)
  resBM <-  BMsel(
    data = Breast,
    x = 4:ncol(Breast),
    y = 2:1,
    tt = 3,
    inter = FALSE,
    std.x = TRUE,
    folds = 5,
    method = c("lasso", "lasso-pcvl"))

  summary(resBM)

  predAcc <- predRes(
    res = resBM,
    traindata = Breast,
    time = 1:4,
    trace = TRUE)
  plot(predAcc, crit = "C")

## End(Not run)

########################################
########################################

[Package biospear version 1.0.2 Index]