LASSO2plus_XGBtraining {csmpv} | R Documentation |
XGBoost Modeling after Variable Selection with LASSO2plus
Description
This function performs variable selection using LASSO2plus and then builds an XGBoost model.
Usage
LASSO2plus_XGBtraining(
data = NULL,
standardization = FALSE,
columnWise = TRUE,
biomks = NULL,
outcomeType = c("binary", "continuous", "time-to-event"),
Y = NULL,
time = NULL,
event = NULL,
nrounds = 5,
nthread = 2,
gamma = 1,
max_depth = 3,
eta = 0.3,
outfile = "nameWithPath",
height = 6
)
Arguments
data |
A data matrix or a data frame where samples are in rows, and features/traits are in columns. |
standardization |
A logical variable to indicate if standardization is needed before variable selection. The default is FALSE. |
columnWise |
A logical variable indicating whether column-wise or row-wise normalization is needed. The default is TRUE, which is used to perform column-wise normalization. This is only meaningful when "standardization" is TRUE. |
biomks |
A vector of potential biomarkers for variable selection. They should be a subset of "data" column names. |
outcomeType |
Outcome variable type. There are three choices: "binary" (default), "continuous", and "time-to-event". |
Y |
Outcome variable name when the outcome type is either "binary" or "continuous". |
time |
Time variable name when outcome type is "time-to-event". |
event |
Event variable name when outcome type is "time-to-event". |
nrounds |
Max number of boosting iterations. |
nthread |
Number of parallel threads used to run XGBoost. |
gamma |
Minimum loss reduction required to make a further partition on a leaf node of the tree. |
max_depth |
Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit. |
eta |
The learning rate for the XGBoost model. |
outfile |
A string for the output file including path if necessary but without file type extension. |
height |
An integer to indicate the forest plot height in inches. |
Details
The first part of LASSO2plus_XGBtraining involves variable selection with LASSO2plus. The LASSO2plus algorithm begins with variable selection using LASSO2, followed by variable selection through single-variable regression for each candidate variable. Finally, the two sets of selected variables are combined and processed to obtain the final list through stepwise variable selection. The second part of LASSO2plus_XGBtraining involves using the final variable list obtained above to build an XGBoost model. It is suitable for three types of outcomes: continuous, binary, and time-to-event.
Value
A list is returned:
XGBoost_model |
An XGBoost model |
XGBoost_model_score |
Model scores for the given training data set. For a continuous outcome variable, this is a vector of the estimated continuous values; for a binary outcome variable, this is a vector representing the probability of the positive class; for time-to-event outcome, this is a vector of risk scores |
Author(s)
Aixiang Jiang
References
Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent (2010), Journal of Statistical Software, Vol. 33(1), 1-22, doi:10.18637/jss.v033.i01.
Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol. 39(5), 1-13, doi:10.18637/jss.v039.i05.
Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System", 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, https://arxiv.org/abs/1603.02754
Examples
# Load in data sets:
data("datlist", package = "csmpv")
tdat = datlist$training
# The function saves files locally. You can define your own temporary directory.
# If not, tempdir() can be used to get the system's temporary directory.
temp_dir = tempdir()
# As an example, let's define Xvars, which will be used later:
Xvars = c("highIPI", "B.Symptoms", "MYC.IHC", "BCL2.IHC", "CD10.IHC", "BCL6.IHC")
# The function can work with three different outcome types.
# Here, we use time-to-event as an example:
# tl2xfit = LASSO2plus_XGBtraining(data = tdat, biomks = Xvars,
# outcomeType = "time-to-event",
# time = "FFP..Years.", event = "Code.FFP",
# outfile = paste0(temp_dir, "/survival_LASSO2plus_XGBoost"))
#
# You might save the files to the directory you want.
# To delete the "temp_dir", use the following:
unlink(temp_dir)