csmpvModelling {csmpv} | R Documentation |
All-in-one Modelling with csmpv R package
Description
This function is designed to simplify the process of building, evaluating and comparing different modelling methods. It offers the flexibility to perform one or all of the following modelling methods: LASSO2, LASSO2 + regression, LASSO_plus, LASSO2plus, XGBoost, LASSO2 + XGBoost, LASSO_plus + XGBoost, and LASSO2plus + XGBoost. The models are trained on the training data, and their performance is validated on a separate validation dataset.
Usage
csmpvModelling(
tdat = NULL,
vdat = NULL,
Ybinary = NULL,
varsBinary = NULL,
Ycont = NULL,
varsCont = NULL,
time = NULL,
event = NULL,
varsSurvival = NULL,
methods = c("all", "LASSO2", "LASSO2_reg", "LASSO_plus", "LASSO2plus", "XGBoost",
"LASSO2_XGBoost", "LASSO_plus_XGBoost", "LASSO2plus_XGBoost"),
outfileName = NULL
)
Arguments
tdat |
Training data. It can not be null. |
vdat |
Validation data. It should contain the same variables as in the training data, including outcome variables. No validation result is saved if it is NULL. |
Ybinary |
Binary outcome variable for classification. |
varsBinary |
Names of binary predictors. |
Ycont |
Continuous outcome variable for regression. |
varsCont |
Names of continuous predictors. |
time |
Time-to-event variable for survival analysis. |
event |
Event/censoring indicator for survival analysis. |
varsSurvival |
Names of predictors for survival analysis. |
methods |
Method(s) to use for modeling. If "all," models for all eight methods will be built. Otherwise, provide one of the following method names: - "LASSO2": Variable selection using LASSO2 with a minimum of two remaining variables. - "LASSO2_reg": Variables selected from LASSO2, followed by regular regression. - "LASSO_plus": Variables selected from LASSO_plus, followed by regular regression. - "LASSO2plus": Variables selected from LASSO2plus, followed by regular regression. - "XGBoost": XGBoost model built without variable selection. - "LASSO2_XGBoost": Variables selected from LASSO2, followed by XGBoost. - "LASSO_plus_XGBoost": Variables selected from LASSO_plus, followed by XGBoost. - "LASSO2plus_XGBoost": Variables selected from LASSO2plus, followed by XGBoost. |
outfileName |
Prefix for output file names. |
Details
By default, this function runs all eight different modeling methods. However, users can specify the "methods" parameter to choose and run a specific modelling method of their choice. For clarity, when providing a 'vdat' argument, the function assumes that it contains the outcome variable, and it proceeds with model validation.
Value
A list of trained models and prediction objects. Results are saved to local files.
Author(s)
Aixiang Jiang
Examples
# Load in data sets:
data("datlist", package = "csmpv")
tdat = datlist$training
vdat = datlist$validation
# The confirmVars function saves files locally. You can define your own temporary directory.
# If not, tempdir() can be used to get the system's temporary directory.
temp_dir = tempdir()
# As an example, let's define Xvars, which will be used later:
Xvars = c("highIPI", "B.Symptoms", "MYC.IHC", "BCL2.IHC", "CD10.IHC", "BCL6.IHC")
# The default setting of this single function generates all models and provides predictions
# and validations for each of them.
# Of course, we can also use this all-in-one function to work on one outcome type
# and one model at a time, for example:
DZlassoreg = csmpvModelling(tdat = tdat, vdat = vdat,
Ybinary = "DZsig", varsBinary = Xvars,
methods = "LASSO2_reg",
outfileName= paste0(temp_dir, "/just_one"))
# This is equivalent to using LASSO2_reg for modeling, followed by prediction and validation
# with rms_model for the classification task "DZsig".
# Six result files are then saved locally.
# You might want to save the files to the directory you prefer.
# To delete the "temp_dir", use the following:
unlink(temp_dir)