| XGpred {csmpv} | R Documentation | 
XGpred: Building Risk Classification Predictive Models using Survival Data
Description
The XGpred function is designed to generate an empirical Bayesian-based binary risk classification model with survival data based on our novel XGpred algorithm, combining XGBoost and traditional survival analysis.
Usage
XGpred(
  data = NULL,
  varsIn = NULL,
  selection = FALSE,
  vsMethod = c("LASSO2", "LASSO2plus", "LASSO_plus"),
  time = NULL,
  event = NULL,
  nrounds = 5,
  probcut = 0.8,
  nclass = c(2, 3),
  topN = 10,
  outfile = "nameWithPath"
)
Arguments
data | 
 A data matrix or a data frame where samples are in rows and features/traits are in columns.  | 
varsIn | 
 A vector of variables used for the prediction model.  | 
selection | 
 Logical. Default is FALSE. If TRUE, three variable selection methods can be chosen.  | 
vsMethod | 
 When "selection" is set to TRUE, three variable selection methods can be chosen, with LASSO2 as the default method. The other two methods are "LASSO2plus" and "LASSO_plus."  | 
time | 
 Time variable name.  | 
event | 
 Event variable name.  | 
nrounds | 
 The maximum number of boosting iterations.  | 
probcut | 
 Probability cutoff for risk group classification. Default is set to 0.8.  | 
nclass | 
 Number of risk groups. By default, it is 2; any samples not classified into high-risk groups are classified into the low-risk group. When 3 is chosen, samples are classified into low, middle, and high-risk groups.  | 
topN | 
 An integer indicating how many variables to select if LASSO_plus is chosen as the variable selection method.  | 
outfile | 
 A string for the output file, including the path if necessary but without a file type extension.  | 
Details
If variable selection is needed, three variable selection methods are provided. Either the given variable or the selected variable list is used to build both an XGBoost model and a traditional Cox model. Risk scores for each model are calculated and ranked, then averaged for each sample. The top 1/3 of samples are defined as the high-risk group, while the bottom 1/3 of samples are defined as the low-risk group. The binary risk classification model is built based on these two risk groups using either the given variable or the selected variable list. The model is a linear combination of these variables, with weights defined as t values derived from the single-variable linear model of each variable on the two groups. Finally, the classification is based on empirical Bayesian probabilities.
Value
A list is returned with the following seven items:
ranks | 
 Ranks from XGboost and Cox  | 
twoEnds | 
 High and low risk group samples identified by mean ranks from XGBoost and Cox models  | 
weights | 
 Weights for each variables used in the model  | 
modelPars | 
 Mean and standard error of model scores for each risk group  | 
nclass | 
 Number of risk groups  | 
XGpred_score | 
 Model XGpred score  | 
XGpred_prob | 
 Empirical Bayesian probability based on model XGpred score  | 
XGpred_prob_class | 
 Risk group classification based on XGpred_prob for the given probability cutoff  | 
probcut | 
 Probability cutoff for risk group classification  | 
Author(s)
Aixiang Jiang
References
Tianqi Chen and Carlos Guestrin (2016), "XGBoost: A Scalable Tree Boosting System", 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, https://arxiv.org/abs/1603.02754
Aoki T, Jiang A, Xu A et al.,(2023) Spatially Resolved Tumor Microenvironment Predicts Treatment Outcomes in Relapsed/Refractory Hodgkin Lymphoma. J Clin Oncol. 2023 Dec 19:JCO2301115. doi: 10.1200/JCO.23.01115. Epub ahead of print. PMID: 38113419.
Examples
# Load in data sets:
data("datlist", package = "csmpv")
tdat = datlist$training
# The function saves files locally. You can define your own temporary directory. 
# If not, tempdir() can be used to get the system's temporary directory.
temp_dir = tempdir()
# As an example, let's define Xvars, which will be used later:
Xvars = c("highIPI", "B.Symptoms", "MYC.IHC", "BCL2.IHC", "CD10.IHC", "BCL6.IHC")
# For given time-to-event outcome and Xvars, we can build up a binary risk classification:
 xgobj = XGpred(data = tdat, varsIn = Xvars, 
                time = "FFP..Years.", event = "Code.FFP", 
                outfile = paste0(temp_dir, "/XGpred"))
# You might save the files to the directory you want.
# To delete the temp_dir, use the following:
unlink(temp_dir)