saenet {miselect} | R Documentation |
Multiple Imputation Stacked Adaptive Elastic Net
Description
Fits an adaptive elastic net for multiply imputed data. The data is stacked and is penalized that each imputation selects the same betas at each value of lambda. "saenet" supports both continuous and binary responses.
Usage
saenet(
x,
y,
pf,
adWeight,
weights,
family = c("gaussian", "binomial"),
alpha = 1,
nlambda = 100,
lambda.min.ratio = ifelse(isTRUE(all.equal(adWeight, rep(1, p))), 0.001, 1e-06),
lambda = NULL,
maxit = 1000,
eps = 1e-05
)
Arguments
x |
A length |
y |
A length |
pf |
Penalty factor. Can be used to differentially penalize certain variables |
adWeight |
Numeric vector of length p representing the adaptive weights for the L1 penalty |
weights |
Numeric vector of length n containing the proportion observed (non-missing) for each row in the un-imputed data. |
family |
The type of response. "gaussian" implies a continuous response and "binomial" implies a binary response. Default is "gaussian". |
alpha |
Elastic net parameter. Can be a vector to cross validate over. Default is 1 |
nlambda |
Length of automatically generated "lambda" sequence. If "lambda" is non NULL, "nlambda" is ignored. Default is 100 |
lambda.min.ratio |
Ratio that determines the minimum value of "lambda" when automatically generating a "lambda" sequence. If "lambda" is not NULL, "lambda.min.ratio" is ignored. Default is 1e-3 |
lambda |
Optional numeric vector of lambdas to fit. If NULL,
|
maxit |
Maximum number of iterations to run. Default is 1000 |
eps |
Tolerance for convergence. Default is 1e-5 |
Details
saenet
works by stacking the multiply imputed data into a single
matrix and running a weighted adaptive elastic net on it. The objective
function is:
argmin_{\beta_j} -\frac{1}{n} \sum_{k=1}^{m} \sum_{i=1}^{n} o_i * L(\beta_j|Y_{ik},X_{ijk})
+ \lambda (\alpha \sum_{j=1}^{p} \hat{a}_j * pf_j |\beta_{j}|
+ (1 - \alpha)\sum_{j=1}^{p} pf_j * \beta_{j}^2)
Where L is the log likelihood, o = w / m
, a
is the
adaptive weights, and pf
is the penalty factor. Simulations suggest
that the "stacked" objective function approach (i.e., saenet
) tends
to be more computationally efficient and have better estimation and selection
properties. However, the advantage of galasso
is that it allows one
to look at the differences between coefficient estimates across imputations.
Value
An object with type saenet and subtype saenet.gaussian or saenet.binomial, depending on which family was used. Both subtypes have 4 elements:
- lambda
Sequence of lambda fit.
- coef
nlambda x nalpha x p + 1 tensor representing the estimated betas at each value of lambda and alpha.
- df
Number of nonzero betas at each value of lambda and alpha.
References
Du, J., Boss, J., Han, P., Beesley, L. J., Kleinsasser, M., Goutman, S. A., ... & Mukherjee, B. (2022). Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods. Journal of Computational and Graphical Statistics, 31(4), 1063-1075. <doi:10.1080/10618600.2022.2035739>
Examples
library(miselect)
library(mice)
mids <- mice(miselect.df, m = 5, printFlag = FALSE)
dfs <- lapply(1:5, function(i) complete(mids, action = i))
# Generate list of imputed design matrices and imputed responses
x <- list()
y <- list()
for (i in 1:5) {
x[[i]] <- as.matrix(dfs[[i]][, paste0("X", 1:20)])
y[[i]] <- dfs[[i]]$Y
}
# Calculate observational weights
weights <- 1 - rowMeans(is.na(miselect.df))
pf <- rep(1, 20)
adWeight <- rep(1, 20)
# Since 'Y' is a binary variable, we use 'family = "binomial"'
fit <- saenet(x, y, pf, adWeight, weights, family = "binomial")