R: Super Learner for Censored Outcomes

survivalSL {survivalSL}

R Documentation

Super Learner for Censored Outcomes

Description

This function allows to compute a Super Learner (SL) to predict survival outcomes.

Usage

survivalSL(methods, metric="ci",  data, times, failures, group=NULL,
cov.quanti=NULL, cov.quali=NULL, cv=10, param.tune=NULL, pro.time=NULL,
optim.local.min=FALSE, ROC.precision=seq(.01,.99,.01),
param.weights.fix=NULL, param.weights.init=NULL,
keep.predictions=TRUE, progress=TRUE)

Arguments

`methods`	A vector of characters with the names of the algorithms included in the SL. At least two algorithms have to be included.
`metric`	The loss function used to estimate the weights of the algorithms in the SL. See details.
`data`	A data frame in which to look for the variables related to the status of the follow-up time (`times`), the event (`failures`), the optional treatment/exposure (`group`) and the covariables included in the previous model (`cov.quanti` and `cov.quali`).
`times`	The name of the variable related the numeric vector with the follow-up times.
`failures`	The name of the variable related the numeric vector with the event indicators (0=right censored, 1=event).
`group`	The name of the variable related to the exposure/treatment. This variable shall have only two modalities encoded 0 for the untreated/unexposed patients and 1 for the treated/exposed ones. The default value is NULL: no specific exposure/treatment is considered. When a specific exposure/treatment is considered, it will be forced in the algorithm or related interactions will be tested when possible.
`cov.quanti`	The name(s) of the variable(s) related to the possible quantitative covariates. These variables must be numeric.
`cov.quali`	The name(s) of the variable(s) related to the possible qualitative covariates. These variables must be numeric with two levels: 0 and 1. A complete disjunctive form must be used for covariates with more levels.
`cv`	The number of splits for cross-validation. The default value is 10.
`param.tune`	A list with a length equals to the number of algorithms included in `methods`. If `NULL`, the tunning parameters are estimated (see details).
`pro.time`	This optional value of prognostic time represents the maximum delay for which the capacity of the variable is evaluated. The same unit than the one used in the argument times. Not used for the following metrics: "loglik", "ibs", "bll", and "ibll". Default value is the time at which half of the subjects are still at risk.
`optim.local.min`	An optional logical value. If `TRUE`, the optimization is performed twice to better ensure the estimation of the weights. If `FALSE` (default value), the optimization is performed once.
`ROC.precision`	The percentiles (between 0 and 1) of the prognostic variable used for computing each point of the time dependent ROC curve. Only used when `metric="auc"`. 0 (min) and 1 (max) are not allowed. By default: `seq(.01,.99,.01)`.
`param.weights.fix`	A vector with the parameters of the multinomial logistic regression which generates the weights of the algorithms declared in `methods`. When completed, the related parameters are not estimated. The default value is NULL: the parameters are estimated by a `cv`-fold cross-validation. See details.
`param.weights.init`	A vector with the initial values of the parameters of the multinomial logistic regression which generates the weights of the algorithms declared in `methods`. The default value is NULL: the initial values are equaled to 0. See details.
`keep.predictions`	A logical value specifying if all the predictions for all the `methods` are saved. If `FALSE`, only the predictions related to the SL are saved (for space saving). The default is `TRUE`.
`progress`	A logical value to print a progress bar in the R console. The default is `TRUE`

Details

Each object of the list declared in param.tune must have the same name than the names of the methods included in the SL. If param.tune = NULL, the tunning parameters of each algorithm are estimated by cv-fold cross-validation. Otherwise, the user can propose a tunning grid for each method, as explained in the following table. The following metrics can be used: "ci" for the concordance index at the prognostic time pro.time, "bs" for the Brier score at the prognostic time pro.time, "loglik" for the log-likelihood, "ibs" for the integrated Brier score up to the last observed time of event, "ibll" for the Integrated binomial log-likelihood up to the last observed time of event, "bll" for the binomial log-likelihood, "ribs" for the restricted integrated Brier score up to the prognostic time pro.time, "ribll" for the restricted integrated binomial log-likelihood up to the last observed time of event, "bll" for the binomial log-likelihood, and "auc" for the area under the time-dependent ROC curve up to the prognostic time pro.time.

The following learners are available:

Names	Description	Package
`"LIB_AFTgamma"`	Gamma-distributed AFT model	flexsurv
`"LIB_AFTggamma"`	Generalized Gamma-distributed AFT model	flexsurv
`"LIB_AFTweibull"`	Weibull-distributed AFT model	flexsurv
`"LIB_PHexponential"`	Exponential-distributed PH model	flexsurv
`"LIB_PHgompertz"`	Gompertz-distributed PH model	flexsurv
`"LIB_PHspline"`	Spline-based PH model	flexsurv
`"LIB_COXall"`	Usual Cox model	survival
`"LIB_COXaic"`	Cox model with AIC-based forward selection	MASS
`"LIB_COXen"`	Elastic Net Cox model	glmnet
`"LIB_COXlasso"`	Lasso Cox model	glmnet
`"LIB_COXridge"`	Ridge Cox model	glmnet
`"LIB_RSF"`	Survival Random Forest	randomForestSRC
`"LIB_SNN"`	Survical Neural Network	survivalmodels

The following loss functions for the estimation of the super learner weigths are available (metric):

Area under the ROC curve ("auc")
Concordance index ("ci")
Brier score ("bs")
Binomial log-likelihood ("bll")
Integrated Brier score ("ibs")
Integrated binomial log-likelihood ("ibll")
Restricted integrated Brier score ("ribs")
Restricted integrated binomial log-Likelihood ("ribll")

Value

`times`	A vector of numeric values with the times of the `predictions`.
`predictions`	A list of matrices with the predictions of survivals of each subject (lines) for each observed time (columns). Each matrix corresponds to the included `methods` and the resulted SL (the last item entitled "sl"). If `keep.predictions=TRUE`, it corresponds to a matrix with predictions related to the SL.
`data`	The data frame used for learning. The first column is entitled `times` and corresponds to the observed follow-up times. The second column is entitled `failures` and corresponds to the event indicators. The other columns correspond to the predictors.
`predictors`	A list with the predictors involved in `group`, `cov.quanti` and `cov.quali`.
`ROC.precision`	The percentiles (between 0 and 1) of the prognostic variable used for computing each point of the time dependent ROC curve.
`cv`	The number of splits for cross-validation.
`pro.time`	The maximum delay for which the capacity of the variable is evaluated.
`models`	A list with the estimated models/algorithms included in the SL.
`weights`	A list composed by two vectors: the regressions `coefficients` of the logistic multinomial regression and the resulting weights' `values`
`metric`	A list composed by two vectors: the loss function used to estimate the weights of the algorithms in the SL and its value.
`param.tune`	The estimated tunning parameters.

References

Polley E and van der Laanet M. Super Learner In Prediction. http://biostats.bepress.com. 2010.

Examples

data(dataDIVAT2)

#The outcome model base on a Super Learner and the first 150 individuals of the data base
sl1 <- survivalSL(methods=c("LIB_AFTgamma", "LIB_PHgompertz"),  metric="ci",
  data=dataDIVAT2[1:150,],  times="times", failures="failures", group="ecd",
  cov.quanti=c("age"),  cov.quali=c("hla", "retransplant"), cv=3)

# Individual prediction
pred <- predict(sl1, newdata=data.frame(age=c(52,52), hla=c(0,1),
retransplant=c(1,1), ecd=c(0,1)))

plot(y=pred$predictions$sl[1,], x=pred$times, xlab="Time (years)",
ylab="Predicted survival", col=1, type="l", lty=1, lwd=2, ylim=c(0,1))

lines(y=pred$predictions$sl[2,], x=pred$times, col=2, type="l", lty=1, lwd=2)

legend("topright", col=c(1,2), lty=1, lwd=2, c("Subject #1", "Subject #2"))

[Package survivalSL version 0.94 Index]