R: Cross-validates and compares Cox Proportionate Hazards and...

survcompare {survcompare}

R Documentation

Cross-validates and compares Cox Proportionate Hazards and Survival Random Forest models

Description

The function performs a repeated nested cross-validation for

Cox-PH (survival package, survival::coxph) or Cox-Lasso (glmnet package, glmnet::cox.fit)
Ensemble of the Cox model and Survival Random Forest (randomForestSRC::rfsrc)
Survival Random Forest on its own, if train_srf = TRUE

The same random seed for the train/test splits are used for all models to aid fair comparison; and the performance metrics are computed for the tree models including Harrel's c-index, time-dependent AUC-ROC, time-dependent Brier Score, and calibration slope. The statistical significance of the performance differences between Cox-PH and Cox-SRF Ensemble is tested and reported.

The function is designed to help with the model selection by quantifying the loss of predictive performance (if any) if Cox-PH is used instead of a more complex model such as SRF which can capture non-linear and interaction terms, as well as non-proportionate hazards. The difference in performance of the Ensembled Cox and SRF and the baseline Cox-PH can be viewed as quantification of the non-linear and cross-terms contribution to the predictive power of the supplied predictors.

Cross-validates and compares Cox Proportionate Hazards and Survival Random Forest models

Usage

survcompare(
  df_train,
  predict_factors,
  predict_time = NULL,
  randomseed = NULL,
  useCoxLasso = FALSE,
  outer_cv = 3,
  inner_cv = 3,
  srf_tuning = list(),
  return_models = FALSE,
  repeat_cv = 2,
  train_srf = FALSE
)

Arguments

`df_train`	training data, a data frame with "time" and "event" columns to define the survival outcome
`predict_factors`	list of column names to be used as predictors
`predict_time`	prediction time of interest. If NULL, 0.90th quantile of event times is used
`randomseed`	random seed for replication
`useCoxLasso`	TRUE / FALSE, for whether to use regularized version of the Cox model, FALSE is default
`outer_cv`	k in k-fold CV
`inner_cv`	k in k-fold CV for internal CV to tune survival random forest hyper-parameters
`srf_tuning`	list of tuning parameters for random forest: 1) NULL for using a default tuning grid, or 2) a list("mtry"=c(...), "nodedepth" = c(...), "nodesize" = c(...))
`return_models`	TRUE/FALSE to return the trained models; default is FALSE, only performance is returned
`repeat_cv`	if NULL, runs once, otherwise repeats several times with different random split for CV, reports average of all
`train_srf`	TRUE/FALSE for whether to train SRF on its own, apart from the CoxPH->SRF ensemble. Default is FALSE as there is not much information in SRF itself compared to the ensembled version.

Value

outcome = list(data frame with performance results, fitted Cox models, fitted SRF)

Author(s)

Diana Shamsutdinova diana.shamsutdinova.github@gmail.com

Examples


df <-simulate_nonlinear(100)
srf_params <- list("mtry" = c(2), "nodedepth"=c(25), "nodesize" =c(15))
mysurvcomp <- survcompare(df, names(df)[1:4], srf_tuning = srf_params, outer_cv = 2, inner_cv =2)
summary(mysurvcomp)

[Package survcompare version 0.1.2 Index]