LASSO2plus {csmpv}R Documentation

Variable Selection and Modeling with LASSO2plus

Description

This function performs variable selection using the LASSO2plus algorithm and subsequently builds a model.

Usage

LASSO2plus(
  data = NULL,
  standardization = FALSE,
  columnWise = TRUE,
  biomks = NULL,
  outcomeType = c("binary", "continuous", "time-to-event"),
  Y = NULL,
  time = NULL,
  event = NULL,
  outfile = "nameWithPath",
  height = 6
)

Arguments

data

A data matrix or a data frame, samples are in rows, and features/traits are in columns.

standardization

A logic variable to indicate if standardization is needed before variable selection, the default is FALSE.

columnWise

A logic variable to indicate if column wise or row wise normalization is needed, the default is TRUE, which is to do column-wise normalization. This is only meaningful when "standardization" is TRUE.

biomks

A vector of potential biomarkers for variable selection, they should be a subset of "data" column names.

outcomeType

Outcome variable type. There are three choices: "binary" (default), "continuous", and "time-to-event".

Y

Outcome variable name when the outcome type is either "binary" or "continuous".

time

Time variable name when outcome type is "time-to-event".

event

Event variable name when outcome type is "time-to-event".

outfile

A string for the output file including path if necessary but without file type extension.

height

An integer to indicate the forest plot height in inches

Details

The LASSO2plus algorithm begins with variable selection using LASSO2, typically involving multiple cross-validation-based LASSO regressions. However, if only one or no variables are selected, the cross-validation results are ignored, and the algorithm ensures a minimum of two remaining variables through full-data lambda simulations. Additionally, it conducts variable selection through single-variable regression for each candidate variable. The variables selected from both LASSO2 and single-variable approaches are then combined to perform traditional variable selection using stepwise regression. This function is designed to handle outcome variables of binary, continuous, or time-to-event type. Following variable selection, a model is constructed using standard R functions such as lm, glm, or coxph, depending on the type of outcome variable.

Value

A list is returned:

fit

A model with selected variables for the given outcome variable

outplot

A forest plot

Author(s)

Aixiang Jiang

References

Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent (2010), Journal of Statistical Software, Vol. 33(1), 1-22, doi:10.18637/jss.v033.i01.

Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol. 39(5), 1-13, doi:10.18637/jss.v039.i05.

Hastie, T. J. and Pregibon, D. (1992) Generalized linear models. Chapter 6 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

Therneau, T., Grambsch, P., Modeling Survival Data: Extending the Cox Model. Springer-Verlag, 2000.

Kassambara A, Kosinski M, Biecek P (2021). survminer: Drawing Survival Curves using 'ggplot2'_. R package version 0.4.9, <https://CRAN.R-project.org/package=survminer>.

Examples

# Load in data sets:
data("datlist", package = "csmpv")
tdat = datlist$training

# The function saves files locally. You can define your own temporary directory. 
# If not, tempdir() can be used to get the system's temporary directory.
temp_dir = tempdir()
# As an example, let's define Xvars, which will be used later:
Xvars = c("highIPI", "B.Symptoms", "MYC.IHC", "BCL2.IHC", "CD10.IHC", "BCL6.IHC")
# The function can work with three different outcome types. 
# Here, we use continuous as an example:
c2fit = LASSO2plus(data = tdat, biomks = Xvars,
                   outcomeType = "continuous", Y = "Age",
                   outfile = paste0(temp_dir, "/continuousLASSO2plus"))
# You might save the files to the directory you want.

# To delete the "temp_dir", use the following:
unlink(temp_dir)

[Package csmpv version 1.0.3 Index]