cv.higlasso {higlasso} | R Documentation |
Cross Validated Hierarchical Integrative Group LASSO
Description
Does k-fold cross-validation for higlasso
, and returns optimal values
for lambda1
and lambda2
.
Usage
cv.higlasso(
Y,
X,
Z,
method = c("aenet", "gglasso"),
lambda1 = NULL,
lambda2 = NULL,
nlambda1 = 10,
nlambda2 = 10,
lambda.min.ratio = 0.05,
nfolds = 5,
foldid = NULL,
sigma = 1,
degree = 2,
maxit = 5000,
tol = 1e-05
)
Arguments
Y |
A length n numeric response vector |
X |
A n x p numeric matrix |
Z |
A n x m numeric matrix |
method |
Type of initialization to use. Possible choices are
|
lambda1 |
A numeric vector of main effect penalties on which to tune
By default, |
lambda2 |
A numeric vector of interaction effects penalties on which to
tune. By default, |
nlambda1 |
The number of lambda1 values to generate. Default is 10,
minimum is 2. If |
nlambda2 |
The number of lambda2 values to generate. Default is 10,
minimum is 2. If |
lambda.min.ratio |
Ratio that calculates min lambda from max lambda. Ignored if 'lambda1' or 'lambda2' is non NULL. Default is 0.05 |
nfolds |
Number of folds for cross validation. Default is 10. The minimum is 3, and while the maximum is the number of observations (ie leave one out cross validation) |
foldid |
An optional vector of values between 1 and
|
sigma |
Scale parameter for integrative weights. Technically a third tuning parameter but defaults to 1 for computational tractability |
degree |
Degree of |
maxit |
Maximum number of iterations. Default is 5000 |
tol |
Tolerance for convergence. Defaults to 1e-5 |
Details
There are a few things to keep in mind when using cv.higlasso
higlasso
uses the strong heredity principle. That is,X_1
andX_2
must included as main effects before the interactionX_1 X_2
can be included.While
higlasso
uses integrative weights to help with estimation,higlasso
is more of a selection method. As a result,cv.higlasso
does not output coefficient estimates, only which variables are selected.Simulation studies suggest that
higlasso
is a very conservative method when it comes to selecting interactions. That is,higlasso
has a low false positive rate and the identification of a nonlinear interaction is a good indicator that further investigation is worthwhile.cv.higlasso
can be slow, so it may may be beneficial to tweak some of its settings (for example,nlambda1
,nlambda2
, andnfolds
) to get a handle on how long the method will take before running the full model.
As a side effect of the conservativeness of the method, we have found that
using the 1 standard error rule results in overly sparse models, and that
lambda.min
generally performs better.
Value
An object of type cv.higlasso
with 7 elements
- lambda
An
nlambda1 x nlambda2 x 2
array containing each pair(lambda1, lambda2)
pair.- lambda.min
lambda pair with the lowest cross validation error
- lambda.1se
- cvm
cross validation error at each lambda pair. The error is calculated from the mean square error.
- cvse
standard error of
cvm
at each lambda pair.- higlasso.fit
higlasso output from fitting the whole data.
- call
The call that generated the output.
Author(s)
Alexander Rix
References
A Hierarchical Integrative Group LASSO (HiGLASSO) Framework for Analyzing Environmental Mixtures. Jonathan Boss, Alexander Rix, Yin-Hsiu Chen, Naveen N. Narisetty, Zhenke Wu, Kelly K. Ferguson, Thomas F. McElrath, John D. Meeker, Bhramar Mukherjee. 2020. arXiv:2003.12844
Examples
library(higlasso)
X <- as.matrix(higlasso.df[, paste0("V", 1:7)])
Y <- higlasso.df$Y
Z <- matrix(1, nrow(X))
# This can take a bit of time
fit <- cv.higlasso(Y, X, Z)
print(fit)