cv.galasso {miselect} | R Documentation |
Cross Validated Multiple Imputation Grouped Adaptive LASSO
Description
Does k-fold cross-validation for galasso
, and returns an optimal value
for lambda.
Usage
cv.galasso(
x,
y,
pf,
adWeight,
family = c("gaussian", "binomial"),
nlambda = 100,
lambda.min.ratio = ifelse(isTRUE(all.equal(adWeight, rep(1, p))), 0.001, 1e-06),
lambda = NULL,
nfolds = 5,
foldid = NULL,
maxit = 1000,
eps = 1e-05
)
Arguments
x |
A length |
y |
A length |
pf |
Penalty factor. Can be used to differentially penalize certain variables |
adWeight |
Numeric vector of length p representing the adaptive weights for the L1 penalty |
family |
The type of response. "gaussian" implies a continuous response and "binomial" implies a binary response. Default is "gaussian". |
nlambda |
Length of automatically generated "lambda" sequence. If "lambda" is non NULL, "nlambda" is ignored. Default is 100 |
lambda.min.ratio |
Ratio that determines the minimum value of "lambda" when automatically generating a "lambda" sequence. If "lambda" is not NULL, "lambda.min.ratio" is ignored. Default is 1e-4 |
lambda |
Optional numeric vector of lambdas to fit. If NULL,
|
nfolds |
Number of foldid to use for cross validation. Default is 5, minimum is 3 |
foldid |
an optional length |
maxit |
Maximum number of iterations to run. Default is 10000 |
eps |
Tolerance for convergence. Default is 1e-5 |
Details
cv.galasso
works by adding a group penalty to the aggregated objective
function to ensure selection consistency across imputations. Simulations
suggest that the "stacked" objective function approaches (i.e., saenet
)
tend to be more computationally efficient and have better estimation and
selection properties.
Value
An object of type "cv.galasso" with 7 elements:
- call
The call that generated the output.
- lambda
The sequence of lambdas fit.
- cvm
Average cross validation error for each "lambda". For family = "gaussian", "cvm" corresponds to mean squared error, and for binomial "cvm" corresponds to deviance.
- cvse
Standard error of "cvm".
- galasso.fit
A "galasso" object fit to the full data.
- lambda.min
The lambda value for the model with the minimum cross validation error.
- lambda.1se
The lambda value for the sparsest model within one standard error of the minimum cross validation error.
- df
The number of nonzero coefficients for each value of lambda.
References
Du, J., Boss, J., Han, P., Beesley, L. J., Kleinsasser, M., Goutman, S. A., ... & Mukherjee, B. (2022). Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods. Journal of Computational and Graphical Statistics, 31(4), 1063-1075. <doi:10.1080/10618600.2022.2035739>
Examples
library(miselect)
library(mice)
set.seed(48109)
# Using the mice defaults for sake of example only.
mids <- mice(miselect.df, m = 5, printFlag = FALSE)
dfs <- lapply(1:5, function(i) complete(mids, action = i))
# Generate list of imputed design matrices and imputed responses
x <- list()
y <- list()
for (i in 1:5) {
x[[i]] <- as.matrix(dfs[[i]][, paste0("X", 1:20)])
y[[i]] <- dfs[[i]]$Y
}
pf <- rep(1, 20)
adWeight <- rep(1, 20)
fit <- cv.galasso(x, y, pf, adWeight)
# By default 'coef' returns the betas for lambda.min.
coef(fit)