survivalSL {survivalSL} | R Documentation |
Super Learner for Censored Outcomes
Description
This function allows to compute a Super Learner (SL) to predict survival outcomes.
Usage
survivalSL(methods, metric="ci", data, times, failures, group=NULL,
cov.quanti=NULL, cov.quali=NULL, cv=10, param.tune=NULL, pro.time=NULL,
optim.local.min=FALSE, ROC.precision=seq(.01,.99,.01),
param.weights.fix=NULL, param.weights.init=NULL,
keep.predictions=TRUE, progress=TRUE)
Arguments
methods |
A vector of characters with the names of the algorithms included in the SL. At least two algorithms have to be included. |
metric |
The loss function used to estimate the weights of the algorithms in the SL. See details. |
data |
A data frame in which to look for the variables related to the status of the follow-up time ( |
times |
The name of the variable related the numeric vector with the follow-up times. |
failures |
The name of the variable related the numeric vector with the event indicators (0=right censored, 1=event). |
group |
The name of the variable related to the exposure/treatment. This variable shall have only two modalities encoded 0 for the untreated/unexposed patients and 1 for the treated/exposed ones. The default value is NULL: no specific exposure/treatment is considered. When a specific exposure/treatment is considered, it will be forced in the algorithm or related interactions will be tested when possible. |
cov.quanti |
The name(s) of the variable(s) related to the possible quantitative covariates. These variables must be numeric. |
cov.quali |
The name(s) of the variable(s) related to the possible qualitative covariates. These variables must be numeric with two levels: 0 and 1. A complete disjunctive form must be used for covariates with more levels. |
cv |
The number of splits for cross-validation. The default value is 10. |
param.tune |
A list with a length equals to the number of algorithms included in |
pro.time |
This optional value of prognostic time represents the maximum delay for which the capacity of the variable is evaluated. The same unit than the one used in the argument times. Not used for the following metrics: "loglik", "ibs", "bll", and "ibll". Default value is the time at which half of the subjects are still at risk. |
optim.local.min |
An optional logical value. If |
ROC.precision |
The percentiles (between 0 and 1) of the prognostic variable used for computing each point of the time dependent ROC curve. Only used when |
param.weights.fix |
A vector with the parameters of the multinomial logistic regression which generates the weights of the algorithms declared in |
param.weights.init |
A vector with the initial values of the parameters of the multinomial logistic regression which generates the weights of the algorithms declared in |
keep.predictions |
A logical value specifying if all the predictions for all the |
progress |
A logical value to print a progress bar in the R console. The default is |
Details
Each object of the list declared in param.tune
must have the same name than the names of the methods
included in the SL. If param.tune
= NULL
, the tunning parameters of each algorithm are estimated by cv
-fold cross-validation. Otherwise, the user can propose a tunning grid for each method, as explained in the following table. The following metrics can be used: "ci" for the concordance index at the prognostic time pro.time
, "bs" for the Brier score at the prognostic time pro.time
, "loglik" for the log-likelihood, "ibs" for the integrated Brier score up to the last observed time of event, "ibll" for the Integrated binomial log-likelihood up to the last observed time of event, "bll" for the binomial log-likelihood, "ribs" for the restricted integrated Brier score up to the prognostic time pro.time
, "ribll" for the restricted integrated binomial log-likelihood up to the last observed time of event, "bll" for the binomial log-likelihood, and "auc" for the area under the time-dependent ROC curve up to the prognostic time pro.time
.
The following learners are available:
Names | Description | Package |
"LIB_AFTgamma" | Gamma-distributed AFT model | flexsurv |
"LIB_AFTggamma" | Generalized Gamma-distributed AFT model | flexsurv |
"LIB_AFTweibull" | Weibull-distributed AFT model | flexsurv |
"LIB_PHexponential" | Exponential-distributed PH model | flexsurv |
"LIB_PHgompertz" | Gompertz-distributed PH model | flexsurv |
"LIB_PHspline" | Spline-based PH model | flexsurv |
"LIB_COXall" | Usual Cox model | survival |
"LIB_COXaic" | Cox model with AIC-based forward selection | MASS |
"LIB_COXen" | Elastic Net Cox model | glmnet |
"LIB_COXlasso" | Lasso Cox model | glmnet |
"LIB_COXridge" | Ridge Cox model | glmnet |
"LIB_RSF" | Survival Random Forest | randomForestSRC |
"LIB_SNN" | Survical Neural Network | survivalmodels |
The following loss functions for the estimation of the super learner weigths are available (metric
):
Area under the ROC curve (
"auc"
)Concordance index (
"ci"
)Brier score (
"bs"
)Binomial log-likelihood (
"bll"
)Integrated Brier score (
"ibs"
)Integrated binomial log-likelihood (
"ibll"
)Restricted integrated Brier score (
"ribs"
)Restricted integrated binomial log-Likelihood (
"ribll"
)
Value
times |
A vector of numeric values with the times of the |
predictions |
A list of matrices with the predictions of survivals of each subject (lines) for each observed time (columns). Each matrix corresponds to the included |
data |
The data frame used for learning. The first column is entitled |
predictors |
A list with the predictors involved in |
ROC.precision |
The percentiles (between 0 and 1) of the prognostic variable used for computing each point of the time dependent ROC curve. |
cv |
The number of splits for cross-validation. |
pro.time |
The maximum delay for which the capacity of the variable is evaluated. |
models |
A list with the estimated models/algorithms included in the SL. |
weights |
A list composed by two vectors: the regressions |
metric |
A list composed by two vectors: the loss function used to estimate the weights of the algorithms in the SL and its value. |
param.tune |
The estimated tunning parameters. |
References
Polley E and van der Laanet M. Super Learner In Prediction. http://biostats.bepress.com. 2010.
Examples
data(dataDIVAT2)
#The outcome model base on a Super Learner and the first 150 individuals of the data base
sl1 <- survivalSL(methods=c("LIB_AFTgamma", "LIB_PHgompertz"), metric="ci",
data=dataDIVAT2[1:150,], times="times", failures="failures", group="ecd",
cov.quanti=c("age"), cov.quali=c("hla", "retransplant"), cv=3)
# Individual prediction
pred <- predict(sl1, newdata=data.frame(age=c(52,52), hla=c(0,1),
retransplant=c(1,1), ecd=c(0,1)))
plot(y=pred$predictions$sl[1,], x=pred$times, xlab="Time (years)",
ylab="Predicted survival", col=1, type="l", lty=1, lwd=2, ylim=c(0,1))
lines(y=pred$predictions$sl[2,], x=pred$times, col=2, type="l", lty=1, lwd=2)
legend("topright", col=c(1,2), lty=1, lwd=2, c("Subject #1", "Subject #2"))