validation {csmpv} | R Documentation |
Validate Model Predictions
Description
This function is designed to perform model validation when the corresponding outcome variable is available. It facilitates the comparison of model predictions with the provided outcome variable, which can be continuous, binary, or time-to-event.
Usage
validation(
predicted = NULL,
outcomeType = c("binary", "continuous", "time-to-event"),
trueY = NULL,
time = NULL,
trueEvent = NULL,
baseHz = NULL,
u = 2,
outfile = "nameWithPath"
)
Arguments
predicted |
A vector of model prediction values, which can be generated by prediction functions in this package such as LASSO2_predict and XGBtraining_predict. For continuous outcomes, this vector represents model scores; for binary outcomes, it represents the probability of the positive class; for time-to-event outcomes, it contains risk scores, which will later be transformed into estimated survival probabilities corresponding to times in the new data. |
outcomeType |
Outcome variable type. There are three choices: "binary" (default), "continuous", and "time-to-event". |
trueY |
A vector of the outcome variable when it is continuous or binary. |
time |
A vector of time for time-to-event outcome. |
trueEvent |
A vector of the event for time-to-event outcome. |
baseHz |
A table for accumulating baseline hazard for multiple time points, usually generated based on a training data set. |
u |
A single numeric follow-up time for survival outcomes. |
outfile |
A string for the output file, including the path if necessary but without the file type extension. |
Details
This function is invoked by multiple prediction functions within this package when an outcome variable is available for a new dataset. However, users can also directly call this function if needed.
Value
A vector of model prediction values from the input
References
Harrell Jr F (2023). rms: Regression Modeling Strategies_. R package version 6.7-1, <https://CRAN.R-project.org/package=rms>
Harrell Jr F (2023). Hmisc: Harrell Miscellaneous_. R package version 5.1-1, <https://CRAN.R-project.org/package=Hmisc>
Examples
# Load in data sets:
data("datlist", package = "csmpv")
tdat = datlist$training
vdat = datlist$validation
# The function saves files locally. You can define your own temporary directory.
# If not, tempdir() can be used to get the system's temporary directory.
temp_dir = tempdir()
# As an example, let's define Xvars, which will be used later:
Xvars = c("highIPI", "B.Symptoms", "MYC.IHC", "BCL2.IHC", "CD10.IHC", "BCL6.IHC")
# The function can work with multiple models and multiple outcome types.
# Here, we use XGBoost model with binary outcome as an example:
bxfit = XGBtraining(data = tdat, biomks = Xvars, Y = "DZsig",
outfile = paste0(temp_dir, "/binary_XGBoost"))
testdat = vdat[,bxfit$XGBoost_model$feature_names]
test = xgboost::xgb.DMatrix(data.matrix(testdat))
scores = stats::predict(bxfit$XGBoost_model, test)
names(scores) = rownames(vdat)
Y = bxfit$Y
outs = validation(predicted = scores, outcomeType = "binary", trueY = vdat[,Y],
outfile = paste0(temp_dir, "/binary_XGBoost_validate"))
# You might save the files to the directory you want.
# To delete the "temp_dir", use the following:
unlink(temp_dir)