cv.saenet {miselect} | R Documentation |
Cross Validated Multiple Imputation Stacked Adaptive Elastic Net
Description
Does k-fold cross-validation for saenet
, and returns optimal values
for lambda and alpha.
Usage
cv.saenet(
x,
y,
pf,
adWeight,
weights,
family = c("gaussian", "binomial"),
alpha = 1,
nlambda = 100,
lambda.min.ratio = ifelse(isTRUE(all.equal(adWeight, rep(1, p))), 0.001, 1e-06),
lambda = NULL,
nfolds = 5,
foldid = NULL,
maxit = 1000,
eps = 1e-05
)
Arguments
x |
A length |
y |
A length |
pf |
Penalty factor of length |
adWeight |
Numeric vector of length p representing the adaptive weights for the L1 penalty |
weights |
Numeric vector of length n containing the proportion observed (non-missing) for each row in the un-imputed data. |
family |
The type of response. "gaussian" implies a continuous response and "binomial" implies a binary response. Default is "gaussian". |
alpha |
Elastic net parameter. Can be a vector to cross validate over. Default is 1 |
nlambda |
Length of automatically generated "lambda" sequence. If "lambda" is non NULL, "nlambda" is ignored. Default is 100 |
lambda.min.ratio |
Ratio that determines the minimum value of "lambda" when automatically generating a "lambda" sequence. If "lambda" is not NULL, "lambda.min.ratio" is ignored. Default is 1e-3 |
lambda |
Optional numeric vector of lambdas to fit. If NULL,
|
nfolds |
Number of foldid to use for cross validation. Default is 5, minimum is 3 |
foldid |
an optional length |
maxit |
Maximum number of iterations to run. Default is 1000 |
eps |
Tolerance for convergence. Default is 1e-5 |
Details
cv.saenet
works by stacking the multiply imputed data into a single
matrix and running a weighted adaptive elastic net on it. Simulations suggest
that the "stacked" objective function approaches tend to be more
computationally efficient and have better estimation and selection
properties.
Due to stacking, the automatically generated lambda
sequence
cv.saenet
generates may end up underestimating lambda.max
, and
thus the degrees of freedom may be nonzero at the first lambda value.
Value
An object of type "cv.saenet" with 9 elements:
- call
The call that generated the output.
- lambda
Sequence of lambdas fit.
- cvm
Average cross validation error for each lambda and alpha. For family = "gaussian", "cvm" corresponds to mean squared error, and for binomial "cvm" corresponds to deviance.
- cvse
Standard error of "cvm".
- saenet.fit
A "saenet" object fit to the full data.
- lambda.min
The lambda value for the model with the minimum cross validation error.
- lambda.1se
The lambda value for the sparsest model within one standard error of the minimum cross validation error.
- alpha.min
The alpha value for the model with the minimum cross validation error.
- alpha.1se
The alpha value for the sparsest model within one standard error of the minimum cross validation error.
- df
The number of nonzero coefficients for each value of lambda and alpha.
References
Du, J., Boss, J., Han, P., Beesley, L. J., Kleinsasser, M., Goutman, S. A., ... & Mukherjee, B. (2022). Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods. Journal of Computational and Graphical Statistics, 31(4), 1063-1075. <doi:10.1080/10618600.2022.2035739>
Examples
library(miselect)
library(mice)
set.seed(48109)
# Using the mice defaults for sake of example only.
mids <- mice(miselect.df, m = 5, printFlag = FALSE)
dfs <- lapply(1:5, function(i) complete(mids, action = i))
# Generate list of imputed design matrices and imputed responses
x <- list()
y <- list()
for (i in 1:5) {
x[[i]] <- as.matrix(dfs[[i]][, paste0("X", 1:20)])
y[[i]] <- dfs[[i]]$Y
}
# Calculate observational weights
weights <- 1 - rowMeans(is.na(miselect.df))
pf <- rep(1, 20)
adWeight <- rep(1, 20)
# Since 'Y' is a binary variable, we use 'family = "binomial"'
fit <- cv.saenet(x, y, pf, adWeight, weights, family = "binomial")
# By default 'coef' returns the betas for (lambda.min , alpha.min)
coef(fit)
# You can also cross validate over alpha
fit <- cv.saenet(x, y, pf, adWeight, weights, family = "binomial",
alpha = c(.5, 1))
# Get selected variables from the 1 standard error rule
coef(fit, lambda = fit$lambda.1se, alpha = fit$alpha.1se)