R: Regularized GGM under missing data with EBIC or k-fold cross...

EMggm {EMgaussian}

R Documentation

Regularized GGM under missing data with EBIC or k-fold cross validation

Description

Regularized GGM under missing data with EBIC or k-fold cross validation

Usage

EMggm(
  dat,
  max.iter = 500,
  est.tol = 1e-07,
  start = c("diag", "pairwise", "listwise", "full"),
  glassoversion = c("glasso", "glassoFast", "glassonostart", "none"),
  rho = 0,
  rhoselect = c("ebic", "kfold"),
  N = NULL,
  gam = 0.5,
  zero.tol = 1e-10,
  k = 5,
  seed = NULL,
  debug = 0,
  convfail = FALSE,
  ...
)

Arguments

`dat`	A data frame or matrix of the raw data.
`max.iter`	Maxumum number of EM cycles for the EM algorithm, passed eventually to `em.prec`.
`est.tol`	Tolerance for change in parameter estimates across EM Cycles. If all changes are less than `tol`, the algorithm terminates.
`start`	Starting value method (see details of `em.prec`). Note that "none" will not do any regularization, making `rho` moot.
`glassoversion`	Character indicating which function to use at the M-step for regularization. See `em.prec`) for more details.
`rho`	Vector of tuning parameter values. Defaults to just a single value of 0 (i.e., also no regularization).
`rhoselect`	Method of selecting the tuning parameter. "ebic" will select the best value in `rho` based on that yielding the best EBIC. "kfold" will do k-fold cross-validation.
`N`	Sample size to use in any EBIC calculations. Defaults to average pairwise sample size.
`gam`	Value of gamma in EBIC calculations. Typical choice, and the default, is .5.
`zero.tol`	Tolerance in EBIC calculations for declaring edges to be zero. Anything in absolute value below `zero.tol` will be considered zero and will not count as a parameter in EBIC calculations.
`k`	If using k-fold cross validation, an integer specifying the number of folds. Defaults to 5.
`seed`	Random number seed passed to function that does k-fold cross validation. Use if you want folds to be more or less replicable.
`debug`	(Experimental) Pick an integer above 0 for some messages that may aide in debugging.
`convfail`	(Experimental) What to do if optimization fails. If TRUE, will fill in the partial correlation matrix with 0's.
`...`	Other arguments passed down to `em.prec` and to functions that do the glasso.

Details

This function will estimate a regularized Gaussian graphical model (GGM) using either EBIC or k-fold cross validation to select the tuning parameter. It was written as a wrapper function to the em.prec function, which is an implementation of the EM algorithm from Städler & Bühlmann (2012). These authors also used k-fold cross validation, which is implemented here.

For the tuning parameter (rho), typically a grid of of values is evaluated and the one that results in the best EBIC or cross validation metric is selected and returned as the final model. See rhogrid and examples for some ways to pick these candidate tuning parameter values.

This function is intended to be compatible (to my knowledge) with estimateNetwork in the bootnet package in that one can input this as a custom function and then have all of the benefits of plotting, centrality indices, and so on, that bootnet provides.

Most methods in examples were studied by Falk and Starr (under review). In particular use of this function in conjunction with estimateNetwork worked well for both "ebic" and "kfold" for model selection, though the original article used the glassoFast package for estimation, which by default penalizes the diagonal of the precision matrix. It seems slightly more common to not penalize the diagonal. Therefore, the below use glasso; this approach performed well but occasionally would get stuck while trying to find an optimal solution. In addition, the two-stage approach studied by Falk and Starr (under review) also performed well, though not as good as the present function; that approach is available in the bootnet package. Note also that cglasso has an implementation of Städler & Bühlmann (2012), but we found in simulations with a high proportion of missing data that our implementation was less likely to encounter estimation problems.

Value

A list with the following elements:

results: Contains the "best" or selected model, with elements mostly following that of em.prec's output. Additional elements include rho and crit, which are the grid of tuning parameter values and value of the criterion. These are added here because if used with bootnet, it may not save the other slots below.
rho: Grid for rho.
crit: Vector of the same length as the grid for rho with the criterion (EBIC or cross-validation). Smaller is better.
graph: Estimated partial correlation matrix; bootnet appears to expect this in order to do plots, compute centrality indices and so on.

References

Evidence that EBIC and k-fold works well with this package (cite if you use the EMglasso R package): Falk, C. F., & Starr, J. (2023, July 19). Regularized cross-sectional network modeling with missing data: A comparison of methods. Preprint: doi:10.31219/osf.io/dk6zv

Original publication on use of EM algorithm and k-fold cross-validation for glasso: Städler, N., & Bühlmann, P. (2012). Missing values: sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing,22, 219–235. doi:10.1007/s11222-010-9219-7

If you use this package with bootnet: Epskamp,S., Borsboom, D., & Fried, E. I. (2018). Estimating psychological networks and their accuracy: A tutorial paper. Behavior research methods, 50(1), 195–212. doi:10.3758/s13428-017-0862-1

If you also use this package with qgraph: Epskamp, S., Cramer, A., Waldorp, L., Schmittmann, V. D., & Borsboom, D. (2012). qgraph: Network visualizations of relationships in psychometric data. Journal of Statistical Software, 48 (1), 1-18.

Foundational article on glasso: Friedman, J. H., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9 (3), 432-441.

Foundational article on use of EBIC with glasso: Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models.

If glasso is also the estimation method: Friedman, J. H., Hastie, T., & Tibshirani, R. (2014). glasso: Graphical lasso estimation of Gaussian graphical models. Retrieved from https://CRAN.R-project.org/package=glasso

If glassoFast is also the estimation method: Sustik M.A., Calderhead B. (2012). GLASSOFAST: An efficient GLASSO implementation. UTCS Technical Report TR-12-29:1-3.

Examples


  library(psych)
  library(bootnet)
  data(bfi)

  # Regularized estimation with just a couple of tuning parameter values
  
  # EBIC
  rho <- seq(.01,.5,length.out = 50)
  ebic1 <- EMggm(bfi[,1:25], rho = rho, glassoversion = "glasso",
                 rhoselect = "ebic")
  
  # k-fold
  kfold1 <- EMggm(bfi[,1:25], rho = rho, glassoversion = "glasso",
                  rhoselect = "kfold")   

  plot(rho, ebic1$crit) # values of EBIC along grid
  plot(rho, kfold1$crit) # values of kfold along grid
  
  # partial correlation matrix
  # ebic1$graph 
  # kfold1$graph

  # Integration with bootnet package
  ebic2 <- estimateNetwork(bfi[,1:25], fun = EMggm, rho = rho,
                           glassoversion = "glasso", rhoselect = "ebic")
  kfold2 <- estimateNetwork(bfi[,1:25], fun = EMggm, rho = rho,
                           glassoversion = "glasso", rhoselect = "kfold")   
  
  # ebic2 and kfold2 now do just about anything one could normally do with an
  # object returned from estimateNetwork e.g., plotting
  plot(ebic2)
  plot(kfold2)
  
  # e.g., centrality indices
  library(qgraph)
  centralityPlot(ebic2)
  # and so on

  # Other ways to pick grid for tuning parameter... 100 values using method
  # that qgraph basically uses; requires estimate of covariance matrix. Here
  # the covariance matrix is obtained directly from the data.
  rho <- rhogrid(100, method="qgraph", dat = bfi[,1:25])
 
  ebic3 <- estimateNetwork(bfi[,1:25], fun = EMggm, rho=rho,
                           glassoversion = "glasso", rhoselect = "ebic")
  kfold3 <- estimateNetwork(bfi[,1:25], fun = EMggm, rho=rho,
                           glassoversion = "glasso", rhoselect = "kfold")  
  
  # look at tuning parameter values vs criterion
  plot(ebic3$result$rho, ebic3$result$crit)
  plot(kfold3$result$rho, kfold3$result$crit)
  
  # EBIC with bootnet; does listwise deletion by default
  ebic.listwise <- estimateNetwork(bfi[,1:25], default="EBICglasso")
  
  # EBIC with bootnet; two-stage estimation
  # Note the constant added to bfi tricks lavaan into thinking the data are
  # not categorical
  ebic.ts <- estimateNetwork(bfi[,1:25] + 1e-10, default="EBICglasso",
                             corMethod="cor_auto", missing="fiml")

[Package EMgaussian version 0.2.1 Index]