csmpvModelling {csmpv}R Documentation

All-in-one Modelling with csmpv R package

Description

This function is designed to simplify the process of building, evaluating and comparing different modelling methods. It offers the flexibility to perform one or all of the following modelling methods: LASSO2, LASSO2 + regression, LASSO_plus, LASSO2plus, XGBoost, LASSO2 + XGBoost, LASSO_plus + XGBoost, and LASSO2plus + XGBoost. The models are trained on the training data, and their performance is validated on a separate validation dataset.

Usage

csmpvModelling(
  tdat = NULL,
  vdat = NULL,
  Ybinary = NULL,
  varsBinary = NULL,
  Ycont = NULL,
  varsCont = NULL,
  time = NULL,
  event = NULL,
  varsSurvival = NULL,
  methods = c("all", "LASSO2", "LASSO2_reg", "LASSO_plus", "LASSO2plus", "XGBoost",
    "LASSO2_XGBoost", "LASSO_plus_XGBoost", "LASSO2plus_XGBoost"),
  outfileName = NULL
)

Arguments

tdat

Training data. It can not be null.

vdat

Validation data. It should contain the same variables as in the training data, including outcome variables. No validation result is saved if it is NULL.

Ybinary

Binary outcome variable for classification.

varsBinary

Names of binary predictors.

Ycont

Continuous outcome variable for regression.

varsCont

Names of continuous predictors.

time

Time-to-event variable for survival analysis.

event

Event/censoring indicator for survival analysis.

varsSurvival

Names of predictors for survival analysis.

methods

Method(s) to use for modeling. If "all," models for all eight methods will be built. Otherwise, provide one of the following method names: - "LASSO2": Variable selection using LASSO2 with a minimum of two remaining variables. - "LASSO2_reg": Variables selected from LASSO2, followed by regular regression. - "LASSO_plus": Variables selected from LASSO_plus, followed by regular regression. - "LASSO2plus": Variables selected from LASSO2plus, followed by regular regression. - "XGBoost": XGBoost model built without variable selection. - "LASSO2_XGBoost": Variables selected from LASSO2, followed by XGBoost. - "LASSO_plus_XGBoost": Variables selected from LASSO_plus, followed by XGBoost. - "LASSO2plus_XGBoost": Variables selected from LASSO2plus, followed by XGBoost.

outfileName

Prefix for output file names.

Details

By default, this function runs all eight different modeling methods. However, users can specify the "methods" parameter to choose and run a specific modelling method of their choice. For clarity, when providing a 'vdat' argument, the function assumes that it contains the outcome variable, and it proceeds with model validation.

Value

A list of trained models and prediction objects. Results are saved to local files.

Author(s)

Aixiang Jiang

Examples

# Load in data sets:
data("datlist", package = "csmpv")
tdat = datlist$training
vdat = datlist$validation

# The confirmVars function saves files locally. You can define your own temporary directory. 
# If not, tempdir() can be used to get the system's temporary directory.
temp_dir = tempdir()

# As an example, let's define Xvars, which will be used later:
Xvars = c("highIPI", "B.Symptoms", "MYC.IHC", "BCL2.IHC", "CD10.IHC", "BCL6.IHC")
# The default setting of this single function generates all models and provides predictions
# and validations for each of them. 
# Of course, we can also use this all-in-one function to work on one outcome type 
# and one model at a time, for example:
DZlassoreg = csmpvModelling(tdat = tdat, vdat = vdat,
                           Ybinary = "DZsig", varsBinary = Xvars,
                           methods = "LASSO2_reg",
                           outfileName= paste0(temp_dir, "/just_one"))
# This is equivalent to using LASSO2_reg for modeling, followed by prediction and validation 
# with rms_model for the classification task "DZsig".
# Six result files are then saved locally.
# You might want to save the files to the directory you prefer.

# To delete the "temp_dir", use the following:
unlink(temp_dir)

[Package csmpv version 1.0.3 Index]