EMggm {EMgaussian}R Documentation

Regularized GGM under missing data with EBIC or k-fold cross validation

Description

Regularized GGM under missing data with EBIC or k-fold cross validation

Usage

EMggm(
  dat,
  max.iter = 500,
  est.tol = 1e-07,
  start = c("diag", "pairwise", "listwise", "full"),
  glassoversion = c("glasso", "glassoFast", "glassonostart", "none"),
  rho = 0,
  rhoselect = c("ebic", "kfold"),
  N = NULL,
  gam = 0.5,
  zero.tol = 1e-10,
  k = 5,
  seed = NULL,
  debug = 0,
  convfail = FALSE,
  ...
)

Arguments

dat

A data frame or matrix of the raw data.

max.iter

Maxumum number of EM cycles for the EM algorithm, passed eventually to em.prec.

est.tol

Tolerance for change in parameter estimates across EM Cycles. If all changes are less than tol, the algorithm terminates.

start

Starting value method (see details of em.prec). Note that "none" will not do any regularization, making rho moot.

glassoversion

Character indicating which function to use at the M-step for regularization. See em.prec) for more details.

rho

Vector of tuning parameter values. Defaults to just a single value of 0 (i.e., also no regularization).

rhoselect

Method of selecting the tuning parameter. "ebic" will select the best value in rho based on that yielding the best EBIC. "kfold" will do k-fold cross-validation.

N

Sample size to use in any EBIC calculations. Defaults to average pairwise sample size.

gam

Value of gamma in EBIC calculations. Typical choice, and the default, is .5.

zero.tol

Tolerance in EBIC calculations for declaring edges to be zero. Anything in absolute value below zero.tol will be considered zero and will not count as a parameter in EBIC calculations.

k

If using k-fold cross validation, an integer specifying the number of folds. Defaults to 5.

seed

Random number seed passed to function that does k-fold cross validation. Use if you want folds to be more or less replicable.

debug

(Experimental) Pick an integer above 0 for some messages that may aide in debugging.

convfail

(Experimental) What to do if optimization fails. If TRUE, will fill in the partial correlation matrix with 0's.

...

Other arguments passed down to em.prec and to functions that do the glasso.

Details

This function will estimate a regularized Gaussian graphical model (GGM) using either EBIC or k-fold cross validation to select the tuning parameter. It was written as a wrapper function to the em.prec function, which is an implementation of the EM algorithm from Städler & Bühlmann (2012). These authors also used k-fold cross validation, which is implemented here.

For the tuning parameter (rho), typically a grid of of values is evaluated and the one that results in the best EBIC or cross validation metric is selected and returned as the final model. See rhogrid and examples for some ways to pick these candidate tuning parameter values.

This function is intended to be compatible (to my knowledge) with estimateNetwork in the bootnet package in that one can input this as a custom function and then have all of the benefits of plotting, centrality indices, and so on, that bootnet provides.

Most methods in examples were studied by Falk and Starr (under review). In particular use of this function in conjunction with estimateNetwork worked well for both "ebic" and "kfold" for model selection, though the original article used the glassoFast package for estimation, which by default penalizes the diagonal of the precision matrix. It seems slightly more common to not penalize the diagonal. Therefore, the below use glasso; this approach performed well but occasionally would get stuck while trying to find an optimal solution. In addition, the two-stage approach studied by Falk and Starr (under review) also performed well, though not as good as the present function; that approach is available in the bootnet package. Note also that cglasso has an implementation of Städler & Bühlmann (2012), but we found in simulations with a high proportion of missing data that our implementation was less likely to encounter estimation problems.

Value

A list with the following elements:

References

Evidence that EBIC and k-fold works well with this package (cite if you use the EMglasso R package): Falk, C. F., & Starr, J. (2023, July 19). Regularized cross-sectional network modeling with missing data: A comparison of methods. Preprint: doi:10.31219/osf.io/dk6zv

Original publication on use of EM algorithm and k-fold cross-validation for glasso: Städler, N., & Bühlmann, P. (2012). Missing values: sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing,22, 219–235. doi:10.1007/s11222-010-9219-7

If you use this package with bootnet: Epskamp,S., Borsboom, D., & Fried, E. I. (2018). Estimating psychological networks and their accuracy: A tutorial paper. Behavior research methods, 50(1), 195–212. doi:10.3758/s13428-017-0862-1

If you also use this package with qgraph: Epskamp, S., Cramer, A., Waldorp, L., Schmittmann, V. D., & Borsboom, D. (2012). qgraph: Network visualizations of relationships in psychometric data. Journal of Statistical Software, 48 (1), 1-18.

Foundational article on glasso: Friedman, J. H., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9 (3), 432-441.

Foundational article on use of EBIC with glasso: Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models.

If glasso is also the estimation method: Friedman, J. H., Hastie, T., & Tibshirani, R. (2014). glasso: Graphical lasso estimation of Gaussian graphical models. Retrieved from https://CRAN.R-project.org/package=glasso

If glassoFast is also the estimation method: Sustik M.A., Calderhead B. (2012). GLASSOFAST: An efficient GLASSO implementation. UTCS Technical Report TR-12-29:1-3.

Examples


  library(psych)
  library(bootnet)
  data(bfi)

  # Regularized estimation with just a couple of tuning parameter values
  
  # EBIC
  rho <- seq(.01,.5,length.out = 50)
  ebic1 <- EMggm(bfi[,1:25], rho = rho, glassoversion = "glasso",
                 rhoselect = "ebic")
  
  # k-fold
  kfold1 <- EMggm(bfi[,1:25], rho = rho, glassoversion = "glasso",
                  rhoselect = "kfold")   

  plot(rho, ebic1$crit) # values of EBIC along grid
  plot(rho, kfold1$crit) # values of kfold along grid
  
  # partial correlation matrix
  # ebic1$graph 
  # kfold1$graph

  # Integration with bootnet package
  ebic2 <- estimateNetwork(bfi[,1:25], fun = EMggm, rho = rho,
                           glassoversion = "glasso", rhoselect = "ebic")
  kfold2 <- estimateNetwork(bfi[,1:25], fun = EMggm, rho = rho,
                           glassoversion = "glasso", rhoselect = "kfold")   
  
  # ebic2 and kfold2 now do just about anything one could normally do with an
  # object returned from estimateNetwork e.g., plotting
  plot(ebic2)
  plot(kfold2)
  
  # e.g., centrality indices
  library(qgraph)
  centralityPlot(ebic2)
  # and so on

  # Other ways to pick grid for tuning parameter... 100 values using method
  # that qgraph basically uses; requires estimate of covariance matrix. Here
  # the covariance matrix is obtained directly from the data.
  rho <- rhogrid(100, method="qgraph", dat = bfi[,1:25])
 
  ebic3 <- estimateNetwork(bfi[,1:25], fun = EMggm, rho=rho,
                           glassoversion = "glasso", rhoselect = "ebic")
  kfold3 <- estimateNetwork(bfi[,1:25], fun = EMggm, rho=rho,
                           glassoversion = "glasso", rhoselect = "kfold")  
  
  # look at tuning parameter values vs criterion
  plot(ebic3$result$rho, ebic3$result$crit)
  plot(kfold3$result$rho, kfold3$result$crit)
  
  # EBIC with bootnet; does listwise deletion by default
  ebic.listwise <- estimateNetwork(bfi[,1:25], default="EBICglasso")
  
  # EBIC with bootnet; two-stage estimation
  # Note the constant added to bfi tricks lavaan into thinking the data are
  # not categorical
  ebic.ts <- estimateNetwork(bfi[,1:25] + 1e-10, default="EBICglasso",
                             corMethod="cor_auto", missing="fiml")
  


[Package EMgaussian version 0.2.1 Index]