R: XGpred: Building Risk Classification Predictive Models using...

XGpred {csmpv}

R Documentation

XGpred: Building Risk Classification Predictive Models using Survival Data

Description

The XGpred function is designed to generate an empirical Bayesian-based binary risk classification model with survival data based on our novel XGpred algorithm, combining XGBoost and traditional survival analysis.

Usage

XGpred(
  data = NULL,
  varsIn = NULL,
  selection = FALSE,
  vsMethod = c("LASSO2", "LASSO2plus", "LASSO_plus"),
  time = NULL,
  event = NULL,
  nrounds = 5,
  probcut = 0.8,
  nclass = c(2, 3),
  topN = 10,
  outfile = "nameWithPath"
)

Arguments

`data`	A data matrix or a data frame where samples are in rows and features/traits are in columns.
`varsIn`	A vector of variables used for the prediction model.
`selection`	Logical. Default is FALSE. If TRUE, three variable selection methods can be chosen.
`vsMethod`	When "selection" is set to TRUE, three variable selection methods can be chosen, with LASSO2 as the default method. The other two methods are "LASSO2plus" and "LASSO_plus."
`time`	Time variable name.
`event`	Event variable name.
`nrounds`	The maximum number of boosting iterations.
`probcut`	Probability cutoff for risk group classification. Default is set to 0.8.
`nclass`	Number of risk groups. By default, it is 2; any samples not classified into high-risk groups are classified into the low-risk group. When 3 is chosen, samples are classified into low, middle, and high-risk groups.
`topN`	An integer indicating how many variables to select if LASSO_plus is chosen as the variable selection method.
`outfile`	A string for the output file, including the path if necessary but without a file type extension.

Details

If variable selection is needed, three variable selection methods are provided. Either the given variable or the selected variable list is used to build both an XGBoost model and a traditional Cox model. Risk scores for each model are calculated and ranked, then averaged for each sample. The top 1/3 of samples are defined as the high-risk group, while the bottom 1/3 of samples are defined as the low-risk group. The binary risk classification model is built based on these two risk groups using either the given variable or the selected variable list. The model is a linear combination of these variables, with weights defined as t values derived from the single-variable linear model of each variable on the two groups. Finally, the classification is based on empirical Bayesian probabilities.

Value

A list is returned with the following seven items:

`ranks`	Ranks from XGboost and Cox
`twoEnds`	High and low risk group samples identified by mean ranks from XGBoost and Cox models
`weights`	Weights for each variables used in the model
`modelPars`	Mean and standard error of model scores for each risk group
`nclass`	Number of risk groups
`XGpred_score`	Model XGpred score
`XGpred_prob`	Empirical Bayesian probability based on model XGpred score
`XGpred_prob_class`	Risk group classification based on XGpred_prob for the given probability cutoff
`probcut`	Probability cutoff for risk group classification

Author(s)

Aixiang Jiang

References

Tianqi Chen and Carlos Guestrin (2016), "XGBoost: A Scalable Tree Boosting System", 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, https://arxiv.org/abs/1603.02754

Aoki T, Jiang A, Xu A et al.,(2023) Spatially Resolved Tumor Microenvironment Predicts Treatment Outcomes in Relapsed/Refractory Hodgkin Lymphoma. J Clin Oncol. 2023 Dec 19:JCO2301115. doi: 10.1200/JCO.23.01115. Epub ahead of print. PMID: 38113419.

Examples

# Load in data sets:
data("datlist", package = "csmpv")
tdat = datlist$training

# The function saves files locally. You can define your own temporary directory. 
# If not, tempdir() can be used to get the system's temporary directory.
temp_dir = tempdir()
# As an example, let's define Xvars, which will be used later:
Xvars = c("highIPI", "B.Symptoms", "MYC.IHC", "BCL2.IHC", "CD10.IHC", "BCL6.IHC")
# For given time-to-event outcome and Xvars, we can build up a binary risk classification:
 xgobj = XGpred(data = tdat, varsIn = Xvars, 
                time = "FFP..Years.", event = "Code.FFP", 
                outfile = paste0(temp_dir, "/XGpred"))
# You might save the files to the directory you want.

# To delete the temp_dir, use the following:
unlink(temp_dir)

[Package csmpv version 1.0.3 Index]