LASSO2plus_XGBtraining {csmpv}R Documentation

XGBoost Modeling after Variable Selection with LASSO2plus

Description

This function performs variable selection using LASSO2plus and then builds an XGBoost model.

Usage

LASSO2plus_XGBtraining(
  data = NULL,
  standardization = FALSE,
  columnWise = TRUE,
  biomks = NULL,
  outcomeType = c("binary", "continuous", "time-to-event"),
  Y = NULL,
  time = NULL,
  event = NULL,
  nrounds = 5,
  nthread = 2,
  gamma = 1,
  max_depth = 3,
  eta = 0.3,
  outfile = "nameWithPath",
  height = 6
)

Arguments

data

A data matrix or a data frame where samples are in rows, and features/traits are in columns.

standardization

A logical variable to indicate if standardization is needed before variable selection. The default is FALSE.

columnWise

A logical variable indicating whether column-wise or row-wise normalization is needed. The default is TRUE, which is used to perform column-wise normalization. This is only meaningful when "standardization" is TRUE.

biomks

A vector of potential biomarkers for variable selection. They should be a subset of "data" column names.

outcomeType

Outcome variable type. There are three choices: "binary" (default), "continuous", and "time-to-event".

Y

Outcome variable name when the outcome type is either "binary" or "continuous".

time

Time variable name when outcome type is "time-to-event".

event

Event variable name when outcome type is "time-to-event".

nrounds

Max number of boosting iterations.

nthread

Number of parallel threads used to run XGBoost.

gamma

Minimum loss reduction required to make a further partition on a leaf node of the tree.

max_depth

Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit.

eta

The learning rate for the XGBoost model.

outfile

A string for the output file including path if necessary but without file type extension.

height

An integer to indicate the forest plot height in inches.

Details

The first part of LASSO2plus_XGBtraining involves variable selection with LASSO2plus. The LASSO2plus algorithm begins with variable selection using LASSO2, followed by variable selection through single-variable regression for each candidate variable. Finally, the two sets of selected variables are combined and processed to obtain the final list through stepwise variable selection. The second part of LASSO2plus_XGBtraining involves using the final variable list obtained above to build an XGBoost model. It is suitable for three types of outcomes: continuous, binary, and time-to-event.

Value

A list is returned:

XGBoost_model

An XGBoost model

XGBoost_model_score

Model scores for the given training data set. For a continuous outcome variable, this is a vector of the estimated continuous values; for a binary outcome variable, this is a vector representing the probability of the positive class; for time-to-event outcome, this is a vector of risk scores

Author(s)

Aixiang Jiang

References

Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent (2010), Journal of Statistical Software, Vol. 33(1), 1-22, doi:10.18637/jss.v033.i01.

Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol. 39(5), 1-13, doi:10.18637/jss.v039.i05.

Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System", 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, https://arxiv.org/abs/1603.02754

Examples

# Load in data sets:
data("datlist", package = "csmpv")
tdat = datlist$training

# The function saves files locally. You can define your own temporary directory. 
# If not, tempdir() can be used to get the system's temporary directory.
temp_dir = tempdir()
# As an example, let's define Xvars, which will be used later:
Xvars = c("highIPI", "B.Symptoms", "MYC.IHC", "BCL2.IHC", "CD10.IHC", "BCL6.IHC")
# The function can work with three different outcome types. 
# Here, we use time-to-event as an example:
# tl2xfit = LASSO2plus_XGBtraining(data = tdat, biomks = Xvars,
#                                 outcomeType = "time-to-event",
#                                time = "FFP..Years.", event = "Code.FFP",
#                                outfile = paste0(temp_dir, "/survival_LASSO2plus_XGBoost"))
#
# You might save the files to the directory you want.

# To delete the "temp_dir", use the following:
unlink(temp_dir)

[Package csmpv version 1.0.3 Index]