mgm {mgm} | R Documentation |
Estimating Mixed Graphical Models
Description
Function to estimate k-degree Mixed Graphical Models via nodewise regression.
Usage
mgm(data, type, level, lambdaSeq, lambdaSel, lambdaFolds,
lambdaGam, alphaSeq, alphaSel, alphaFolds, alphaGam,
k, moderators, ruleReg, weights, threshold, method,
binarySign, scale, verbatim, pbar, warnings, saveModels,
saveData, overparameterize, thresholdCat, signInfo, ...)
Arguments
data |
n x p data matrix |
type |
p vector indicating the type of variable for each column in |
level |
p vector indicating the number of categories of each variable. For continuous variables set to 1. |
lambdaSeq |
A sequence of lambdas that should be searched (see also |
lambdaSel |
Specifies the procedure for selecting the tuning parameter controlling the Lq-penalization. The two options are cross validation "CV" and the Extended Bayesian Information Criterion (EBIC) "EBIC". The EBIC performs well in selecting sparse graphs (see Barber and Drton, 2010 and Foygel and Drton, 2014). Note that when also searching the alpha parameter in the elastic net penalty, cross validation should be preferred, as the parameter vector will not necessarily be sparse anymore. The EBIC tends to be a bit more conservative than CV (see Haslbeck and Waldorp, 2016). CV can sometimes not be performed with categorical variables, because |
lambdaFolds |
Number of folds in cross validation if |
lambdaGam |
Hyperparameter gamma in the EBIC if |
alphaSeq |
A sequence of alpha parameters for the elastic net penality in [0,1] that should be searched (see also |
alphaSel |
Specifies the procedure for selecting the alpha parameter in the elastic net penalty. The two options are cross validation "CV" and the Extended Bayesian Information Criterion (EBIC) "EBIC". The EBIC performs well in selecting sparse graphs (see Barber and Drton, 2010 and Foygel and Drton, 2014). Note that when also searching the alpha parameter in the elastic net penalty, cross validation should be preferred, as the parameter vector will not necessarily be sparse anymore. The EBIC tends to be a bit more conservative than CV (see Haslbeck and Waldorp, 2016). CV can sometimes not be performed with categorical variables, because |
alphaFolds |
Number of folds in cross validation if |
alphaGam |
Hyperparameter gamma in the EBIC if |
k |
Order up until including which interactions are included in the model. |
moderators |
Integer vector with elements in |
ruleReg |
Rule used to combine estimates from neighborhood regression. E.g. for pairwise interactions, two estimates (one from regressing A on B and one from B on A) have to combined in one edge parameter. |
weights |
A n vector with weights for observations. |
threshold |
A threshold below which edge-weights are put to zero. This is done in order to guarantee a lower bound on the false-positive rate. |
method |
Estimation method, currently only |
binarySign |
If |
scale |
If |
verbatim |
If |
pbar |
If |
warnings |
If |
saveModels |
If |
saveData |
If |
overparameterize |
If |
thresholdCat |
If |
signInfo |
If |
... |
Additional arguments. |
Details
mgm()
estimates an exponential mixed graphical model as introduced in Yang and colleagies (2014). Note that MGMs are not normalizable for all parameter values. See Chen, Witten & Shojaie (2015) for an overview of when pairwise MGMs are normalizable. To our best knowledge, for MGMs with interactions of order > 2 that include non-categorical variables, the conditions for normalizablity are unknown.
Value
The function returns a list with the following entries:
call |
Contains all provided input arguments. If |
pairwise |
Contains a list with all information about estimated pairwise interactions. |
interactions |
A list with three entries that relate each interaction in the model to all its parameters. This is different to the output provided in |
intercepts |
A list with p entries, which contain the intercept/thresholds for each node in the network. In case a given node is categorical with m categories, there are m thresholds for this variable. |
nodemodels |
A list with p |
Author(s)
Jonas Haslbeck <jonashaslbeck@gmail.com>
References
Barber, R. F., & Drton, M. (2015). High-dimensional Ising model selection with Bayesian information criteria. Electronic Journal of Statistics, 9(1), 567-607.
Chen S, Witten DM & Shojaie (2015). Selection and estimation for mixed graphical models. Biometrika, 102(1), 47.
Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models. In Advances in neural information processing systems (pp. 604-612).
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1), 1.
Haslbeck, J. M. B., & Waldorp, L. J. (2020). mgm: Estimating time-varying Mixed Graphical Models in high-dimensional Data. Journal of Statistical Software, 93(8), pp. 1-46. DOI: 10.18637/jss.v093.i08
Loh, P. L., & Wainwright, M. J. (2012, December). Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses. In NIPS (pp. 2096-2104).
Yang, E., Baker, Y., Ravikumar, P., Allen, G. I., & Liu, Z. (2014, April). Mixed Graphical Models via Exponential Families. In AISTATS (Vol. 2012, pp. 1042-1050).
Examples
## Not run:
## We fit a pairwise and 3-order MGM to the mixed Autism dataset (?autism_data)
# 1) Fit Pairwise MGM
# Call mgm()
fit_k2 <- mgm(data = autism_data$data,
type = autism_data$type,
level = autism_data$lev,
k = 2) # ad most pairwise interacitons
# Weighted adjacency matrix
fit_k2$pairwise$wadj
# Visualize using qgraph()
library(qgraph)
qgraph(fit_k2$pairwise$wadj,
edge.color = fit_k2$pairwise$edgecolor,
layout = "spring",
labels = autism_data$colnames)
# 2) Fit MGM with pairwise & three-way interactions
fit_k3 <- mgm(data = autism_data$data,
type = autism_data$type,
level = autism_data$lev,
k = 3) # include all interactions up to including order 3
# List of estimated interactions
fit_k3$interactions$indicator
# 3) Plot Factor Graph
FactorGraph(object = fit_k3,
PairwiseAsEdge = FALSE,
labels = autism_data$colnames)
# 4) Predict values
pred_obj <- predict(fit_k3, autism_data$data)
head(pred_obj$predicted) # first six rows of predicted values
pred_obj$errors # Nodewise errors
## Here we illustrate why we need to overparameterize the design matrix to
## recover higher order interactions including categorical variables
# 1) Define Graph (one 3-way interaction between 3 binary variables)
# a) General Graph Info
type = c("c", "c", "c")
level = c(2, 2, 2)
# b) Define Interaction
factors <- list()
factors[[1]] <- NULL # no pairwise interactions
factors[[2]] <- matrix(c(1,2,3), ncol=3, byrow = T) # one 3-way interaction
interactions <- list()
interactions[[1]] <- NULL
interactions[[2]] <- vector("list", length = 1)
# threeway interaction no1
interactions[[2]][[1]] <- array(0, dim = c(level[1], level[2], level[3]))
theta <- .7
interactions[[2]][[1]][1, 1, 1] <- theta #weight theta for conf (1,1,1), weight 0 for all others
# c) Define Thresholds
thresholds <- list()
thresholds[[1]] <- c(0, 0)
thresholds[[2]] <- c(0, 0)
thresholds[[3]] <- c(0, 0)
# 2) Sample from Graph
iter <- 1
set.seed(iter)
N <- 2000
d_iter <- mgmsampler(factors = factors,
interactions = interactions,
thresholds = thresholds,
type = type,
level = level,
N = N,
nIter = 50,
pbar = TRUE)
# 3.1) Estimate order 3 MGM using standard parameterization
d_est_stand <- mgm(data = d_iter$data,
type = type,
level = level,
k = 3,
lambdaSel = "CV",
ruleReg = "AND",
pbar = TRUE,
overparameterize = FALSE,
signInfo = FALSE)
# We look at the nodewise regression for node 1 (same for all)
coefs_stand <- d_est_stand$nodemodels[[1]]$model
coefs_stand
# We see that nonzero-zero pattern of parameter vector does not allow us to infer whether
# interactions are present or not
# 3.2) Estimate order 3 MGM using overparameterization
d_est_over <- mgm(data = d_iter$data,
type = type,
level = level,
k = 3,
lambdaSel = "CV",
ruleReg = "AND",
pbar = TRUE,
overparameterize = TRUE,
signInfo = FALSE)
# We look at the nodewise regression for node 1 (same for all)
coefs_over <- d_est_over$nodemodels[[1]]$model
coefs_over # recovers exactly the 3-way interaction
# For more examples see https://github.com/jmbh/mgmDocumentation
## End(Not run)