EMggm {EMgaussian} | R Documentation |
Regularized GGM under missing data with EBIC or k-fold cross validation
Description
Regularized GGM under missing data with EBIC or k-fold cross validation
Usage
EMggm(
dat,
max.iter = 500,
est.tol = 1e-07,
start = c("diag", "pairwise", "listwise", "full"),
glassoversion = c("glasso", "glassoFast", "glassonostart", "none"),
rho = 0,
rhoselect = c("ebic", "kfold"),
N = NULL,
gam = 0.5,
zero.tol = 1e-10,
k = 5,
seed = NULL,
debug = 0,
convfail = FALSE,
...
)
Arguments
dat |
A data frame or matrix of the raw data. |
max.iter |
Maxumum number of EM cycles for the EM algorithm, passed eventually to |
est.tol |
Tolerance for change in parameter estimates across EM Cycles. If
all changes are less than |
start |
Starting value method (see details of |
glassoversion |
Character indicating which function to use at the M-step for regularization. See
|
rho |
Vector of tuning parameter values. Defaults to just a single value of 0 (i.e., also no regularization). |
rhoselect |
Method of selecting the tuning parameter. "ebic" will select the best value in |
N |
Sample size to use in any EBIC calculations. Defaults to average pairwise sample size. |
gam |
Value of gamma in EBIC calculations. Typical choice, and the default, is .5. |
zero.tol |
Tolerance in EBIC calculations for declaring edges to be zero. Anything in absolute value
below |
k |
If using k-fold cross validation, an integer specifying the number of folds. Defaults to 5. |
seed |
Random number seed passed to function that does k-fold cross validation. Use if you want folds to be more or less replicable. |
debug |
(Experimental) Pick an integer above 0 for some messages that may aide in debugging. |
convfail |
(Experimental) What to do if optimization fails. If TRUE, will fill in the partial correlation matrix with 0's. |
... |
Other arguments passed down to |
Details
This function will estimate a regularized Gaussian graphical model (GGM) using either EBIC or k-fold
cross validation to select the tuning parameter. It was written as a wrapper function to the
em.prec
function, which is an implementation of the EM algorithm from
Städler & Bühlmann (2012). These authors also used k-fold cross validation, which is implemented here.
For the tuning parameter (rho
), typically a grid of of values is evaluated and the one
that results in the best EBIC or cross validation metric is selected and returned as the final model.
See rhogrid
and examples for some ways to pick these candidate tuning parameter values.
This function is intended to be compatible (to my knowledge) with estimateNetwork
in the bootnet
package in that one can input this as a custom function and then have all of the
benefits of plotting, centrality indices, and so on, that bootnet
provides.
Most methods in examples were studied by Falk and Starr (under review). In particular use of this
function in conjunction with estimateNetwork
worked well for both "ebic"
and "kfold" for model selection, though the original article used the glassoFast
package for estimation, which by default penalizes the diagonal of the precision matrix. It seems
slightly more common to not penalize the diagonal. Therefore, the below use glasso
;
this approach performed well but occasionally would get stuck while trying to find an optimal solution.
In addition, the two-stage approach studied by Falk and Starr (under review) also performed well, though
not as good as the present function; that approach is available in the bootnet package. Note also that
cglasso
has an implementation of Städler & Bühlmann (2012), but we found in
simulations with a high proportion of missing data that our implementation was less likely to
encounter estimation problems.
Value
A list with the following elements:
results
: Contains the "best" or selected model, with elements mostly following that ofem.prec
's output. Additional elements includerho
andcrit
, which are the grid of tuning parameter values and value of the criterion. These are added here because if used withbootnet
, it may not save the other slots below.rho
: Grid forrho
.crit
: Vector of the same length as the grid forrho
with the criterion (EBIC or cross-validation). Smaller is better.graph
: Estimated partial correlation matrix;bootnet
appears to expect this in order to do plots, compute centrality indices and so on.
References
Evidence that EBIC and k-fold works well with this package (cite if you use the EMglasso R package): Falk, C. F., & Starr, J. (2023, July 19). Regularized cross-sectional network modeling with missing data: A comparison of methods. Preprint: doi:10.31219/osf.io/dk6zv
Original publication on use of EM algorithm and k-fold cross-validation for glasso: Städler, N., & Bühlmann, P. (2012). Missing values: sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing,22, 219–235. doi:10.1007/s11222-010-9219-7
If you use this package with bootnet: Epskamp,S., Borsboom, D., & Fried, E. I. (2018). Estimating psychological networks and their accuracy: A tutorial paper. Behavior research methods, 50(1), 195–212. doi:10.3758/s13428-017-0862-1
If you also use this package with qgraph: Epskamp, S., Cramer, A., Waldorp, L., Schmittmann, V. D., & Borsboom, D. (2012). qgraph: Network visualizations of relationships in psychometric data. Journal of Statistical Software, 48 (1), 1-18.
Foundational article on glasso: Friedman, J. H., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9 (3), 432-441.
Foundational article on use of EBIC with glasso: Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models.
If glasso is also the estimation method: Friedman, J. H., Hastie, T., & Tibshirani, R. (2014). glasso: Graphical lasso estimation of Gaussian graphical models. Retrieved from https://CRAN.R-project.org/package=glasso
If glassoFast is also the estimation method: Sustik M.A., Calderhead B. (2012). GLASSOFAST: An efficient GLASSO implementation. UTCS Technical Report TR-12-29:1-3.
Examples
library(psych)
library(bootnet)
data(bfi)
# Regularized estimation with just a couple of tuning parameter values
# EBIC
rho <- seq(.01,.5,length.out = 50)
ebic1 <- EMggm(bfi[,1:25], rho = rho, glassoversion = "glasso",
rhoselect = "ebic")
# k-fold
kfold1 <- EMggm(bfi[,1:25], rho = rho, glassoversion = "glasso",
rhoselect = "kfold")
plot(rho, ebic1$crit) # values of EBIC along grid
plot(rho, kfold1$crit) # values of kfold along grid
# partial correlation matrix
# ebic1$graph
# kfold1$graph
# Integration with bootnet package
ebic2 <- estimateNetwork(bfi[,1:25], fun = EMggm, rho = rho,
glassoversion = "glasso", rhoselect = "ebic")
kfold2 <- estimateNetwork(bfi[,1:25], fun = EMggm, rho = rho,
glassoversion = "glasso", rhoselect = "kfold")
# ebic2 and kfold2 now do just about anything one could normally do with an
# object returned from estimateNetwork e.g., plotting
plot(ebic2)
plot(kfold2)
# e.g., centrality indices
library(qgraph)
centralityPlot(ebic2)
# and so on
# Other ways to pick grid for tuning parameter... 100 values using method
# that qgraph basically uses; requires estimate of covariance matrix. Here
# the covariance matrix is obtained directly from the data.
rho <- rhogrid(100, method="qgraph", dat = bfi[,1:25])
ebic3 <- estimateNetwork(bfi[,1:25], fun = EMggm, rho=rho,
glassoversion = "glasso", rhoselect = "ebic")
kfold3 <- estimateNetwork(bfi[,1:25], fun = EMggm, rho=rho,
glassoversion = "glasso", rhoselect = "kfold")
# look at tuning parameter values vs criterion
plot(ebic3$result$rho, ebic3$result$crit)
plot(kfold3$result$rho, kfold3$result$crit)
# EBIC with bootnet; does listwise deletion by default
ebic.listwise <- estimateNetwork(bfi[,1:25], default="EBICglasso")
# EBIC with bootnet; two-stage estimation
# Note the constant added to bfi tricks lavaan into thinking the data are
# not categorical
ebic.ts <- estimateNetwork(bfi[,1:25] + 1e-10, default="EBICglasso",
corMethod="cor_auto", missing="fiml")