LASSO2plus {csmpv} | R Documentation |
Variable Selection and Modeling with LASSO2plus
Description
This function performs variable selection using the LASSO2plus algorithm and subsequently builds a model.
Usage
LASSO2plus(
data = NULL,
standardization = FALSE,
columnWise = TRUE,
biomks = NULL,
outcomeType = c("binary", "continuous", "time-to-event"),
Y = NULL,
time = NULL,
event = NULL,
outfile = "nameWithPath",
height = 6
)
Arguments
data |
A data matrix or a data frame, samples are in rows, and features/traits are in columns. |
standardization |
A logic variable to indicate if standardization is needed before variable selection, the default is FALSE. |
columnWise |
A logic variable to indicate if column wise or row wise normalization is needed, the default is TRUE, which is to do column-wise normalization. This is only meaningful when "standardization" is TRUE. |
biomks |
A vector of potential biomarkers for variable selection, they should be a subset of "data" column names. |
outcomeType |
Outcome variable type. There are three choices: "binary" (default), "continuous", and "time-to-event". |
Y |
Outcome variable name when the outcome type is either "binary" or "continuous". |
time |
Time variable name when outcome type is "time-to-event". |
event |
Event variable name when outcome type is "time-to-event". |
outfile |
A string for the output file including path if necessary but without file type extension. |
height |
An integer to indicate the forest plot height in inches |
Details
The LASSO2plus algorithm begins with variable selection using LASSO2, typically involving multiple cross-validation-based LASSO regressions. However, if only one or no variables are selected, the cross-validation results are ignored, and the algorithm ensures a minimum of two remaining variables through full-data lambda simulations. Additionally, it conducts variable selection through single-variable regression for each candidate variable. The variables selected from both LASSO2 and single-variable approaches are then combined to perform traditional variable selection using stepwise regression. This function is designed to handle outcome variables of binary, continuous, or time-to-event type. Following variable selection, a model is constructed using standard R functions such as lm, glm, or coxph, depending on the type of outcome variable.
Value
A list is returned:
fit |
A model with selected variables for the given outcome variable |
outplot |
A forest plot |
Author(s)
Aixiang Jiang
References
Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent (2010), Journal of Statistical Software, Vol. 33(1), 1-22, doi:10.18637/jss.v033.i01.
Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol. 39(5), 1-13, doi:10.18637/jss.v039.i05.
Hastie, T. J. and Pregibon, D. (1992) Generalized linear models. Chapter 6 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
Therneau, T., Grambsch, P., Modeling Survival Data: Extending the Cox Model. Springer-Verlag, 2000.
Kassambara A, Kosinski M, Biecek P (2021). survminer: Drawing Survival Curves using 'ggplot2'_. R package version 0.4.9, <https://CRAN.R-project.org/package=survminer>.
Examples
# Load in data sets:
data("datlist", package = "csmpv")
tdat = datlist$training
# The function saves files locally. You can define your own temporary directory.
# If not, tempdir() can be used to get the system's temporary directory.
temp_dir = tempdir()
# As an example, let's define Xvars, which will be used later:
Xvars = c("highIPI", "B.Symptoms", "MYC.IHC", "BCL2.IHC", "CD10.IHC", "BCL6.IHC")
# The function can work with three different outcome types.
# Here, we use continuous as an example:
c2fit = LASSO2plus(data = tdat, biomks = Xvars,
outcomeType = "continuous", Y = "Age",
outfile = paste0(temp_dir, "/continuousLASSO2plus"))
# You might save the files to the directory you want.
# To delete the "temp_dir", use the following:
unlink(temp_dir)