XGBtraining_predict {csmpv}R Documentation

Predicting XGBoost Model Scores and Performing Validation

Description

This function predicts XGBoost model scores using an XGBtraining object and a new dataset. It converts the input data to the required xgb.DMatrix format and returns the model scores. If the new dataset includes an outcome variable, the function also performs validation, comparing predictions with observed outcomes.

Usage

XGBtraining_predict(
  xgbtrainingObj = NULL,
  newdata = NULL,
  newY = FALSE,
  outfile = "nameWithPath"
)

Arguments

xgbtrainingObj

An XGBtraining object returned from the XGBtraining function.

newdata

A data matrix or a data frame, samples are in rows, and features/traits are in columns.

newY

A logical variable indicating if 'newdata' contains the outcome variable.

outfile

A string for the output file including path if necessary but without file type extension.

Value

A vector of predicted values is return. If an outcome variable is available for the new dataset, validation is performed.

predicted

A vector of model prediction values. For continuous outcome, this is a vector of model scores; for binary outcome, this is a vector representing the probability of the positive class; for time to event outcome, this is a vector of risk scores

Author(s)

Aixiang Jiang

References

Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System", 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, https://arxiv.org/abs/1603.02754

Harrell Jr F (2023). rms: Regression Modeling Strategies_. R package version 6.7-1, <https://CRAN.R-project.org/package=rms>

Harrell Jr F (2023). Hmisc: Harrell Miscellaneous_. R package version 5.1-1, <https://CRAN.R-project.org/package=Hmisc>

Examples

# Load in data sets:
data("datlist", package = "csmpv")
tdat = datlist$training
vdat = datlist$validation

# The function saves files locally. You can define your own temporary directory. 
# If not, tempdir() can be used to get the system's temporary directory.
temp_dir = tempdir()
# As an example, let's define Xvars, which will be used later:
Xvars = c("highIPI", "B.Symptoms", "MYC.IHC", "BCL2.IHC", "CD10.IHC", "BCL6.IHC")

# The function can work with multiple models and multiple outcome types. 
# Here, we provide an example using the XGBoost model with a time-to-event outcome:
txfit = XGBtraining(data = tdat, biomks = Xvars,
                    outcomeType = "time-to-event",
                    time = "FFP..Years.",event = "Code.FFP",
                    outfile = paste0(temp_dir, "/survival_XGBoost"))
ptxfit = XGBtraining_predict(txfit, newdata = vdat,
                    outfile = paste0(temp_dir, "/pred_XGBoost_time_to_event"))
# To delete the "temp_dir", use the following:
unlink(temp_dir)

[Package csmpv version 1.0.3 Index]