LASSO2 {csmpv}R Documentation

Variable Selection using Modified LASSO with a Minimum of Two Remaining Variables

Description

This function conducts variable selection using LASSO (Least Absolute Shrinkage and Selection Operator) with a minor adaptation. It calculates the mean lambda value from multiple cv.glmnet runs and ensures the selection of at least two variables.

Usage

LASSO2(
  data = NULL,
  standardization = FALSE,
  columnWise = TRUE,
  biomks = NULL,
  outcomeType = c("binary", "continuous", "time-to-event"),
  Y = NULL,
  time = NULL,
  event = NULL,
  nfolds = 10,
  outfile = "nameWithPath"
)

Arguments

data

A data matrix or a data frame where samples are in rows and features/traits are in columns.

standardization

A logical variable indicating if standardization is needed before variable selection. The default is FALSE.

columnWise

A logical variable indicating if column-wise or row-wise normalization is needed. The default is TRUE, which means column-wise normalization is performed. This is only meaningful when "standardization" is TRUE.

biomks

A vector of potential biomarkers for variable selection. They should be a subset of the column names in the "data" variable.

outcomeType

The outcome variable type. There are three choices: "binary" (default), "continuous", and "time-to-event".

Y

The outcome variable name when the outcome type is either "binary" or "continuous".

time

The time variable name when the outcome type is "time-to-event".

event

The event variable name when the outcome type is "time-to-event".

nfolds

The number of folds for cross-validation. The default is 10.

outfile

A string representing the output file, including the path if necessary, but without the file type extension.

Details

The function utilizes glmnet::cv.glmnet for cross-validation-based variable selection with the largest value of lambda such that error is within 1 standard error of the minimum. To mitigate randomness from cross-validation splits, it conducts 10 runs (this number can later be parameterized) of n-fold cv.glmnet. The resulting average lambda value across these runs serves as the final lambda. Subsequently, the final regularization regression is performed on the complete dataset using this mean lambda value. Following this, the function assesses the count of remaining variables. If only one or none are selected, the function defaults to selecting the first lambda that results in at least two chosen variables on the full dataset. This function is designed to handle three types of outcome variables: continuous, binary, and time-to-event.

Value

A list is returned:

coefs

A vector of LASSO coefficients

h0

Cumulative baseline hazard table, for time to event outcome only

Y

The outcome variable name when the outcome type is either "binary" or "continuous".

time

The time variable name when the outcome type is "time-to-event".

event

The event variable name when the outcome type is "time-to-event".

standardization

A logical variable indicating if standardization is needed before variable selection.

columnWise

A logical variable indicating if column-wise or row-wise normalization is needed.

outcomeType

The outcome variable type.

allplot

A plot object

A shrunken coefficient vector is returned

Author(s)

Aixiang Jiang

References

Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent (2010), Journal of Statistical Software, Vol. 33(1), 1-22, doi:10.18637/jss.v033.i01.

Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol. 39(5), 1-13, doi:10.18637/jss.v039.i05.

Examples

# Load in data sets:
data("datlist", package = "csmpv")
tdat = datlist$training

# The function saves files locally. You can define your own temporary directory. 
# If not, tempdir() can be used to get the system's temporary directory.
temp_dir = tempdir()
# As an example, let's define Xvars, which will be used later:
Xvars = c("highIPI", "B.Symptoms", "MYC.IHC", "BCL2.IHC", "CD10.IHC", "BCL6.IHC")
# The function can work with three different outcome types.
# Here, we use time-to-event as an example:
# tl = LASSO2(data = tdat, biomks = Xvars,
#             outcomeType = "time-to-event",
#             time = "FFP..Years.",event = "Code.FFP",
#             outfile = paste0(temp_dir, "/survivalLASSO2"))
# You might save the files to the directory you want.

# To delete the "temp_dir", use the following:
unlink(temp_dir)

[Package csmpv version 1.0.3 Index]