survivalSL {survivalSL}R Documentation

Super Learner for Censored Outcomes

Description

This function allows to compute a Super Learner (SL) to predict survival outcomes.

Usage

survivalSL(methods, metric="ci",  data, times, failures, group=NULL,
cov.quanti=NULL, cov.quali=NULL, cv=10, param.tune=NULL, pro.time=NULL,
optim.local.min=FALSE, ROC.precision=seq(.01,.99,.01),
param.weights.fix=NULL, param.weights.init=NULL,
keep.predictions=TRUE, progress=TRUE)

Arguments

methods

A vector of characters with the names of the algorithms included in the SL. At least two algorithms have to be included.

metric

The loss function used to estimate the weights of the algorithms in the SL. See details.

data

A data frame in which to look for the variables related to the status of the follow-up time (times), the event (failures), the optional treatment/exposure (group) and the covariables included in the previous model (cov.quanti and cov.quali).

times

The name of the variable related the numeric vector with the follow-up times.

failures

The name of the variable related the numeric vector with the event indicators (0=right censored, 1=event).

group

The name of the variable related to the exposure/treatment. This variable shall have only two modalities encoded 0 for the untreated/unexposed patients and 1 for the treated/exposed ones. The default value is NULL: no specific exposure/treatment is considered. When a specific exposure/treatment is considered, it will be forced in the algorithm or related interactions will be tested when possible.

cov.quanti

The name(s) of the variable(s) related to the possible quantitative covariates. These variables must be numeric.

cov.quali

The name(s) of the variable(s) related to the possible qualitative covariates. These variables must be numeric with two levels: 0 and 1. A complete disjunctive form must be used for covariates with more levels.

cv

The number of splits for cross-validation. The default value is 10.

param.tune

A list with a length equals to the number of algorithms included in methods. If NULL, the tunning parameters are estimated (see details).

pro.time

This optional value of prognostic time represents the maximum delay for which the capacity of the variable is evaluated. The same unit than the one used in the argument times. Not used for the following metrics: "loglik", "ibs", "bll", and "ibll". Default value is the time at which half of the subjects are still at risk.

optim.local.min

An optional logical value. If TRUE, the optimization is performed twice to better ensure the estimation of the weights. If FALSE (default value), the optimization is performed once.

ROC.precision

The percentiles (between 0 and 1) of the prognostic variable used for computing each point of the time dependent ROC curve. Only used when metric="auc". 0 (min) and 1 (max) are not allowed. By default: seq(.01,.99,.01).

param.weights.fix

A vector with the parameters of the multinomial logistic regression which generates the weights of the algorithms declared in methods. When completed, the related parameters are not estimated. The default value is NULL: the parameters are estimated by a cv-fold cross-validation. See details.

param.weights.init

A vector with the initial values of the parameters of the multinomial logistic regression which generates the weights of the algorithms declared in methods. The default value is NULL: the initial values are equaled to 0. See details.

keep.predictions

A logical value specifying if all the predictions for all the methods are saved. If FALSE, only the predictions related to the SL are saved (for space saving). The default is TRUE.

progress

A logical value to print a progress bar in the R console. The default is TRUE

Details

Each object of the list declared in param.tune must have the same name than the names of the methods included in the SL. If param.tune = NULL, the tunning parameters of each algorithm are estimated by cv-fold cross-validation. Otherwise, the user can propose a tunning grid for each method, as explained in the following table. The following metrics can be used: "ci" for the concordance index at the prognostic time pro.time, "bs" for the Brier score at the prognostic time pro.time, "loglik" for the log-likelihood, "ibs" for the integrated Brier score up to the last observed time of event, "ibll" for the Integrated binomial log-likelihood up to the last observed time of event, "bll" for the binomial log-likelihood, "ribs" for the restricted integrated Brier score up to the prognostic time pro.time, "ribll" for the restricted integrated binomial log-likelihood up to the last observed time of event, "bll" for the binomial log-likelihood, and "auc" for the area under the time-dependent ROC curve up to the prognostic time pro.time.

The following learners are available:

Names Description Package
"LIB_AFTgamma" Gamma-distributed AFT model flexsurv
"LIB_AFTggamma" Generalized Gamma-distributed AFT model flexsurv
"LIB_AFTweibull" Weibull-distributed AFT model flexsurv
"LIB_PHexponential" Exponential-distributed PH model flexsurv
"LIB_PHgompertz" Gompertz-distributed PH model flexsurv
"LIB_PHspline" Spline-based PH model flexsurv
"LIB_COXall" Usual Cox model survival
"LIB_COXaic" Cox model with AIC-based forward selection MASS
"LIB_COXen" Elastic Net Cox model glmnet
"LIB_COXlasso" Lasso Cox model glmnet
"LIB_COXridge" Ridge Cox model glmnet
"LIB_RSF" Survival Random Forest randomForestSRC
"LIB_SNN" Survical Neural Network survivalmodels

The following loss functions for the estimation of the super learner weigths are available (metric):

Value

times

A vector of numeric values with the times of the predictions.

predictions

A list of matrices with the predictions of survivals of each subject (lines) for each observed time (columns). Each matrix corresponds to the included methods and the resulted SL (the last item entitled "sl"). If keep.predictions=TRUE, it corresponds to a matrix with predictions related to the SL.

data

The data frame used for learning. The first column is entitled times and corresponds to the observed follow-up times. The second column is entitled failures and corresponds to the event indicators. The other columns correspond to the predictors.

predictors

A list with the predictors involved in group, cov.quanti and cov.quali.

ROC.precision

The percentiles (between 0 and 1) of the prognostic variable used for computing each point of the time dependent ROC curve.

cv

The number of splits for cross-validation.

pro.time

The maximum delay for which the capacity of the variable is evaluated.

models

A list with the estimated models/algorithms included in the SL.

weights

A list composed by two vectors: the regressions coefficients of the logistic multinomial regression and the resulting weights' values

metric

A list composed by two vectors: the loss function used to estimate the weights of the algorithms in the SL and its value.

param.tune

The estimated tunning parameters.

References

Polley E and van der Laanet M. Super Learner In Prediction. http://biostats.bepress.com. 2010.

Examples

data(dataDIVAT2)

#The outcome model base on a Super Learner and the first 150 individuals of the data base
sl1 <- survivalSL(methods=c("LIB_AFTgamma", "LIB_PHgompertz"),  metric="ci",
  data=dataDIVAT2[1:150,],  times="times", failures="failures", group="ecd",
  cov.quanti=c("age"),  cov.quali=c("hla", "retransplant"), cv=3)

# Individual prediction
pred <- predict(sl1, newdata=data.frame(age=c(52,52), hla=c(0,1),
retransplant=c(1,1), ecd=c(0,1)))

plot(y=pred$predictions$sl[1,], x=pred$times, xlab="Time (years)",
ylab="Predicted survival", col=1, type="l", lty=1, lwd=2, ylim=c(0,1))

lines(y=pred$predictions$sl[2,], x=pred$times, col=2, type="l", lty=1, lwd=2)

legend("topright", col=c(1,2), lty=1, lwd=2, c("Subject #1", "Subject #2"))

[Package survivalSL version 0.94 Index]