XGpred {csmpv} | R Documentation |
XGpred: Building Risk Classification Predictive Models using Survival Data
Description
The XGpred function is designed to generate an empirical Bayesian-based binary risk classification model with survival data based on our novel XGpred algorithm, combining XGBoost and traditional survival analysis.
Usage
XGpred(
data = NULL,
varsIn = NULL,
selection = FALSE,
vsMethod = c("LASSO2", "LASSO2plus", "LASSO_plus"),
time = NULL,
event = NULL,
nrounds = 5,
probcut = 0.8,
nclass = c(2, 3),
topN = 10,
outfile = "nameWithPath"
)
Arguments
data |
A data matrix or a data frame where samples are in rows and features/traits are in columns. |
varsIn |
A vector of variables used for the prediction model. |
selection |
Logical. Default is FALSE. If TRUE, three variable selection methods can be chosen. |
vsMethod |
When "selection" is set to TRUE, three variable selection methods can be chosen, with LASSO2 as the default method. The other two methods are "LASSO2plus" and "LASSO_plus." |
time |
Time variable name. |
event |
Event variable name. |
nrounds |
The maximum number of boosting iterations. |
probcut |
Probability cutoff for risk group classification. Default is set to 0.8. |
nclass |
Number of risk groups. By default, it is 2; any samples not classified into high-risk groups are classified into the low-risk group. When 3 is chosen, samples are classified into low, middle, and high-risk groups. |
topN |
An integer indicating how many variables to select if LASSO_plus is chosen as the variable selection method. |
outfile |
A string for the output file, including the path if necessary but without a file type extension. |
Details
If variable selection is needed, three variable selection methods are provided. Either the given variable or the selected variable list is used to build both an XGBoost model and a traditional Cox model. Risk scores for each model are calculated and ranked, then averaged for each sample. The top 1/3 of samples are defined as the high-risk group, while the bottom 1/3 of samples are defined as the low-risk group. The binary risk classification model is built based on these two risk groups using either the given variable or the selected variable list. The model is a linear combination of these variables, with weights defined as t values derived from the single-variable linear model of each variable on the two groups. Finally, the classification is based on empirical Bayesian probabilities.
Value
A list is returned with the following seven items:
ranks |
Ranks from XGboost and Cox |
twoEnds |
High and low risk group samples identified by mean ranks from XGBoost and Cox models |
weights |
Weights for each variables used in the model |
modelPars |
Mean and standard error of model scores for each risk group |
nclass |
Number of risk groups |
XGpred_score |
Model XGpred score |
XGpred_prob |
Empirical Bayesian probability based on model XGpred score |
XGpred_prob_class |
Risk group classification based on XGpred_prob for the given probability cutoff |
probcut |
Probability cutoff for risk group classification |
Author(s)
Aixiang Jiang
References
Tianqi Chen and Carlos Guestrin (2016), "XGBoost: A Scalable Tree Boosting System", 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, https://arxiv.org/abs/1603.02754
Aoki T, Jiang A, Xu A et al.,(2023) Spatially Resolved Tumor Microenvironment Predicts Treatment Outcomes in Relapsed/Refractory Hodgkin Lymphoma. J Clin Oncol. 2023 Dec 19:JCO2301115. doi: 10.1200/JCO.23.01115. Epub ahead of print. PMID: 38113419.
Examples
# Load in data sets:
data("datlist", package = "csmpv")
tdat = datlist$training
# The function saves files locally. You can define your own temporary directory.
# If not, tempdir() can be used to get the system's temporary directory.
temp_dir = tempdir()
# As an example, let's define Xvars, which will be used later:
Xvars = c("highIPI", "B.Symptoms", "MYC.IHC", "BCL2.IHC", "CD10.IHC", "BCL6.IHC")
# For given time-to-event outcome and Xvars, we can build up a binary risk classification:
xgobj = XGpred(data = tdat, varsIn = Xvars,
time = "FFP..Years.", event = "Code.FFP",
outfile = paste0(temp_dir, "/XGpred"))
# You might save the files to the directory you want.
# To delete the temp_dir, use the following:
unlink(temp_dir)