| OSTE {OSTE} | R Documentation | 
Optimal Survival Tree Ensemble
Description
Optimal survival trees ensemble is the main function of OSTE package that grows a sufficiently large number, t.initial, of  survival trees and selects optimal survival trees from the total trees grown by random survival forest. Number of survival trees in the initial set, t.initial, is chosen by the user. If not chosen, then the default t.initial = 500 is used. Based on empirical investigation, t.initial =1000 is recommended.
Usage
OSTE(formula = NULL, data, t.initial = NULL, v.size = NULL, mtry = NULL, M = NULL,
minimum.node.size = NULL, always.split.features = NULL, replace = TRUE,
splitting.rule = NULL, info = TRUE)
Arguments
| formula | Object of class formula describing the required model to be fitted. Interaction terms are not supported in the current version. | 
| data | A   | 
| t.initial |  Number of survival trees to be grown initially. If equal to  | 
| v.size | Portion of data used for validation in the second phase i.e. for assessing survival trees performance in the ensemble. If equal to  | 
| mtry | Number of features selected at random at each node of the survival trees for splitting. If equal to  | 
| M | Percent of the best  | 
| minimum.node.size | Minimal node size. If equal to  | 
| always.split.features | Vector of variable names if desired to be always selected in addition to the mtry variables tried for splitting. | 
| replace | Whether sampling should be done with or without replacement. | 
| splitting.rule | Splitting rule." | 
| info | If  | 
Details
Large values are recommended for t.initial for better performance as possible under the available computational resources. The log-rank test statistic is used as defalut,
A C-index based splitting rule (Schmid et al. 2015) and maximally selected rank statistics  (Wright et al. 2016) are available. The C-index shows better predictive performance in case of high censoring rate, where logrank is best for situations where the data are noisy (Schmid et al. 2015).
Value
| unique.death.times | Unique death times. | 
| CHF | Estimated cumulative hazard function for each observation. | 
| Survival_Prob | Estimated survival probability for each observation. | 
| trees_selected | Number of trees selected. | 
| mtry | Value of mtry used. | 
| forest | Saved forest for prediction purposes. | 
Note
In the case of missing values in any dataset prior action needs to be taken as the fuction can not handle them at the current version. Moreover, the status/delta variable in the data must be code as 0, 1.
Author(s)
Naz Gul, Nosheen Faiz, Zardad Khan and Berthold Lausen.
References
Marvin N. Wright, Andreas Ziegler (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1), 1-17. doi:10.18637/jss.v077.i01
Terry Therneau, Beth Atkinson and Brian Ripley (2015) rpart: Recursive Partitioning and Regression Trees. R package version 4.1-10. https://CRAN.R-project.org/package=rpart
Ulla B. Mogensen, Hemant Ishwaran, Thomas A. Gerds (2012). Evaluating Random Forests for Survival Analysis Using Prediction Error Curves. Journal of Statistical Software, 50(11), 1-23. URL http://www.jstatsoft.org/v50/i11/.
Schmid, M., Wright, M. N. & Ziegler, A. (2016). On the use of Harrell's C for clinical risk prediction via random survival forests. Expert Syst Appl 63:450-459. http://dx.doi.org/10.1016/j.eswa.2016.07.018.
Wright, M. N., Dankowski, T. & Ziegler, A. (2017). Unbiased split variable selection for random survival forests using maximally selected rank statistics. Stat Med. http://dx.doi.org/10.1002/sim.7212.
Zardad Khan, Asma Gul, Aris Perperoglou, Osama Mahmoud, Werner Adler, Miftahuddin and Berthold Lausen (2015). OTE: Optimal Trees Ensembles for Regression, Classification and Class Membership Probability Estimation. R package version 1.0. https://CRAN.R-project.org/package=OTE
Gul, N., Faiz, N., Brawn, D., Kulakowski, R., Khan, Z., & Lausen, B. (2020). Optimal survival trees ensemble. arXiv preprint arXiv:2005.09043.
See Also
Examples
#Load the data
data(VETERAN)
library(survival)
library(prodlim)
library(ranger)
library(pec)
#Divide the data into training and test parts
 predictSurvProb.ranger <- function (object, newdata, times, ...) {
    ptemp <- ranger:::predict.ranger(object, data = newdata, importance = "none")$survival
    pos <- sindex(jump.times = object$unique.death.times,
                           eval.times = times)
    p <- cbind(1, ptemp)[, pos + 1, drop = FALSE]
    if (NROW(p) != NROW(newdata) || NCOL(p) != length(times))
      stop(paste("\nPrediction matrix has wrong dimensions:\nRequested newdata x times: ",
                 NROW(dts[trainind,]), " x ", length(1), "\nProvided prediction matrix: ",
                 NROW(p), " x ", NCOL(p), "\n\n", sep = ""))
    p
  }
n <- nrow(VETERAN)
trainind <- sample(1:n,n*0.7)
testind <- (1:n)[-trainind]
# Grow OSTE on the training data
OSTE.fit <- OSTE(Surv(time,status)~.,data=VETERAN[trainind,],t.initial=100)
# Predict on the test data
pred <- ranger:::predict.ranger(OSTE.fit$forest,data=VETERAN[testind,])
# Index various values
pred$survival
pred$survival
#etc.
# To calculate IBS
# Create formula
frm <- as.formula(Surv(time, status) ~ trt + celltype + karno + diagtime + age + prior)
PredError <- pec(object=OSTE.fit$forest, exact==TRUE,
                   formula = frm, cens.model="marginal",
                   data=VETERAN[testind,], verbose=F)
IBS <- crps(object = PredError, times =100, start = PredError$start)[2,1]
IBS