Optimal Survival Tree Ensemble


Optimal survival trees ensemble is the main function of OSTE package that grows a sufficiently large number, t.initial, of survival trees and selects optimal survival trees from the total trees grown by random survival forest. Number of survival trees in the initial set, t.initial, is chosen by the user. If not chosen, then the default t.initial = 500 is used. Based on empirical investigation, t.initial =1000 is recommended.


OSTE(formula = NULL, data, t.initial = NULL, v.size = NULL, mtry = NULL, M = NULL,
minimum.node.size = NULL, always.split.features = NULL, replace = TRUE,
splitting.rule = NULL, info = TRUE)



Object of class formula describing the required model to be fitted. Interaction terms are not supported in the current version.


A nxd matrix or data frame of n observations on d features along with response variables that are described by the formula.


Number of survival trees to be grown initially. If equal to NULL then the defalut of t.initial = 500 is taken. A recommended value is t.initial = 1000.


Portion of data used for validation in the second phase i.e. for assessing survival trees performance in the ensemble. If equal to NULL then the defalut v.size=0.1


Number of features selected at random at each node of the survival trees for splitting. If equal to NULL then the default sqrt(d) is taken.


Percent of the best t.initial survival trees to be selected on the basis of their performance on out-of-bag observations. For selecting 20% of trees, take M=0.2.


Minimal node size. If equal to NULL then the default minimum.node.size = 3 is executed.


Vector of variable names if desired to be always selected in addition to the mtry variables tried for splitting.


Whether sampling should be done with or without replacement.


Splitting rule."logrank", "C" or "maxstat" are suported with default "logrank".


If TRUE, displays process status .


Large values are recommended for t.initial for better performance as possible under the available computational resources. The log-rank test statistic is used as defalut, A C-index based splitting rule (Schmid et al. 2015) and maximally selected rank statistics (Wright et al. 2016) are available. The C-index shows better predictive performance in case of high censoring rate, where logrank is best for situations where the data are noisy (Schmid et al. 2015).



Unique death times.


Estimated cumulative hazard function for each observation.


Estimated survival probability for each observation.


Number of trees selected.


Value of mtry used.


Saved forest for prediction purposes.


In the case of missing values in any dataset prior action needs to be taken as the fuction can not handle them at the current version. Moreover, the status/delta variable in the data must be code as 0, 1.


Naz Gul, Nosheen Faiz, Zardad Khan and Berthold Lausen.


#Load the data
#Divide the data into training and test parts

 predictSurvProb.ranger <- function (object, newdata, times, ...) {

    ptemp <- ranger:::predict.ranger(object, data = newdata, importance = "none")$survival
    pos <- sindex(jump.times = object$unique.death.times,
                           eval.times = times)
    p <- cbind(1, ptemp)[, pos + 1, drop = FALSE]
    if (NROW(p) != NROW(newdata) || NCOL(p) != length(times))
      stop(paste("\nPrediction matrix has wrong dimensions:\nRequested newdata x times: ",
                 NROW(dts[trainind,]), " x ", length(1), "\nProvided prediction matrix: ",
                 NROW(p), " x ", NCOL(p), "\n\n", sep = ""))

n <- nrow(VETERAN)
trainind <- sample(1:n,n*0.7)
testind <- (1:n)[-trainind]

# Grow OSTE on the training data

OSTE.fit <- OSTE(Surv(time,status)~.,data=VETERAN[trainind,],t.initial=100)

# Predict on the test data

pred <- ranger:::predict.ranger(OSTE.fit$forest,data=VETERAN[testind,])

# Index various values



# To calculate IBS
# Create formula
frm <- as.formula(Surv(time, status) ~ trt + celltype + karno + diagtime + age + prior)

PredError <- pec(object=OSTE.fit$forest, exact==TRUE,
                   formula = frm, cens.model="marginal",
                   data=VETERAN[testind,], verbose=F)
IBS <- crps(object = PredError, times =100, start = PredError$start)[2,1]

