bgm {bgms}R Documentation

Bayesian structure learning in Markov Random Fields of mixed binary and ordinal variables using MCMC.

Description

The function bgm explores the joint pseudoposterior distribution of structures and parameters in a Markov Random Field for mixed binary and ordinal variables.

Usage

bgm(
  x,
  iter = 10000,
  burnin = 1000,
  interaction_prior = c("Cauchy", "UnitInfo"),
  cauchy_scale = 2.5,
  edge_prior = c("Bernoulli", "Beta-Bernoulli"),
  inclusion_probability = 0.5,
  beta_bernoulli_alpha = 1,
  beta_bernoulli_beta = 1,
  threshold_alpha = 0.5,
  threshold_beta = 0.5,
  adaptive = FALSE,
  na.action = c("listwise", "impute"),
  save = FALSE,
  display_progress = TRUE
)

Arguments

x

A data frame or matrix with n rows and p columns containing binary and ordinal variables for n independent observations and p variables in the network. Variables are recoded as non-negative integers (0, 1, ..., m) if not already done. Unobserved categories are collapsed into other categories after recoding (i.e., if category 1 is unobserved, the data will be recoded from (0, 2) to (0, 1)).

iter

The number of iterations of the Gibbs sampler. The default of 1e4 is for illustrative purposes. For stable estimates, it is recommended to run the Gibbs sampler for at least 1e5 iterations.

burnin

The number of iterations of the Gibbs sampler before its output is saved. Since it may take some time for the Gibbs sampler to converge to the posterior distribution, it is recommended not to set this number too low.

interaction_prior

The type of prior distribution that is used for the interaction effects. Currently, two prior densities are implemented: The Cauchy prior (interaction_prior = "Cauchy") and the Unit Information prior (interaction_prior = "UnitInfo").

cauchy_scale

The scale of the Cauchy prior for interactions. Defaults to 2.5.

edge_prior

The prior distribution for the edges or structure of the network. Two prior distributions are currently implemented: The Bernoulli model edge_prior = "Bernoulli" assumes that the probability that an edge between two variables is included is equal to inclusion_probability and independent of other edges or variables. When inclusion_probability = 0.5, this implies that each network structure receives the same prior weight. The Beta-Bernoulli model edge_prior = "Beta-Bernoulli" assumes a beta prior for the unknown inclusion probability with shape parameters beta_bernoulli_alpha and beta_bernoulli_beta. If beta_bernoulli_alpha = 1 and beta_bernoulli_beta = 1, this means that networks with the same complexity (number of edges) receive the same prior weight. Defaults to edge_prior = "Bernoulli".

inclusion_probability

The prior edge inclusion probability for the Bernoulli model. Can be a single probability, or a matrix of p rows and p columns specifying an inclusion probability for each edge pair. Defaults to inclusion_probability = 0.5.

beta_bernoulli_alpha, beta_bernoulli_beta

The two shape parameters of the Beta prior density for the Bernoulli inclusion probability. Must be positive numbers. Defaults to beta_bernoulli_alpha = 1 and beta_bernoulli_beta = 1.

threshold_alpha, threshold_beta

The shape parameters of the beta-prime prior density for the threshold parameters. Must be positive values. If the two values are equal, the prior density is symmetric about zero. If threshold_beta is greater than threshold_alpha, the distribution is skewed to the left, and if threshold_beta is less than threshold_alpha, it is skewed to the right. Smaller values tend to lead to more diffuse prior distributions.

adaptive

Should the function use an adaptive Metropolis algorithm to update interaction parameters within the model? If adaptive = TRUE, bgm adjusts the proposal variance to match the acceptance probability of the random walk Metropolis algorithm to be close to the optimum of .234 using a Robbins-Monro type algorithm. If adaptive = FALSE, it sets the proposal variance to the inverse of the observed Fisher information matrix (the second derivative at the posterior mode). Defaults to FALSE.

na.action

How do you want the function to handle missing data? If na.action = "listwise", listwise deletion is used. If na.action = "impute", missing data are imputed iteratively during the MCMC procedure. Since imputation of missing data can have a negative impact on the convergence speed of the MCMC procedure, it is recommended to run the MCMC for more iterations. Also, since the numerical routines that search for the mode of the posterior do not have an imputation option, the bgm function will automatically switch to interaction_prior = "Cauchy" and adaptive = TRUE.

save

Should the function collect and return all samples from the Gibbs sampler (save = TRUE)? Or should it only return the (model-averaged) posterior means (save = FALSE)? Defaults to FALSE.

display_progress

Should the function show a progress bar (display_progress = TRUE)? Or not (display_progress = FALSE)? Defaults to TRUE.

Details

A discrete spike and slab prior distribution is stipulated on the pairwise interactions. By formulating it as a mixture of mutually singular distributions, the function can use a combination of Metropolis-Hastings and Gibbs sampling to create a Markov chain that has the joint posterior distribution as invariant. Current options for the slab distribution are the unit-information prior or a Cauchy with an optional scaling parameter. A Beta-prime distribution is used for the exponent of the category parameters. A uniform prior is used for edge inclusion variables (i.e., the prior probability that the edge is included is 0.5).

Value

If save = FALSE (the default), the result is a list of class “bgms” containing the following matrices:

If save = TRUE, the result is a list of class “bgms” containing:

Column averages of these matrices provide the model-averaged posterior means.

In addition to the analysis results, the bgm output lists some of the arguments of its call. This is useful for post-processing the results.

Examples


 #Store user par() settings
 op <- par(no.readonly = TRUE)

 ##Analyse the Wenchuan dataset

 # Here, we use 1e4 iterations, for an actual analysis please use at least
 # 1e5 iterations.
 fit = bgm(x = Wenchuan)


 #------------------------------------------------------------------------------|
 # INCLUSION - EDGE WEIGHT PLOT
 #------------------------------------------------------------------------------|

 par(mar = c(6, 5, 1, 1))
 plot(x = fit$interactions[lower.tri(fit$interactions)],
      y = fit$gamma[lower.tri(fit$gamma)], ylim = c(0, 1),
      xlab = "", ylab = "", axes = FALSE, pch = 21, bg = "gray", cex = 1.3)
 abline(h = 0, lty = 2, col = "gray")
 abline(h = 1, lty = 2, col = "gray")
 abline(h = .5, lty = 2, col = "gray")
 mtext("Posterior Mode Edge Weight", side = 1, line = 3, cex = 1.7)
 mtext("Posterior Inclusion Probability", side = 2, line = 3, cex = 1.7)
 axis(1)
 axis(2, las = 1)


 #------------------------------------------------------------------------------|
 # EVIDENCE - EDGE WEIGHT PLOT
 #------------------------------------------------------------------------------|

 #The bgms package currently assumes that the prior odds are 1:
 prior.odds = 1
 posterior.inclusion = fit$gamma[lower.tri(fit$gamma)]
 posterior.odds = posterior.inclusion / (1 - posterior.inclusion)
 log.bayesfactor = log(posterior.odds / prior.odds)
 log.bayesfactor[log.bayesfactor > 5] = 5

 par(mar = c(5, 5, 1, 1) + 0.1)
 plot(fit$interactions[lower.tri(fit$interactions)], log.bayesfactor, pch = 21, bg = "#bfbfbf",
      cex = 1.3, axes = FALSE, xlab = "", ylab = "", ylim = c(-5, 5.5),
      xlim = c(-0.5, 1.5))
 axis(1)
 axis(2, las = 1)
 abline(h = log(1/10), lwd = 2, col = "#bfbfbf")
 abline(h = log(10), lwd = 2, col = "#bfbfbf")

 text(x = 1, y = log(1 / 10), labels = "Evidence for Exclusion", pos = 1,
      cex = 1.7)
 text(x = 1, y = log(10), labels = "Evidence for Inclusion", pos = 3, cex = 1.7)
 text(x = 1, y = 0, labels = "Absence of Evidence", cex = 1.7)
 mtext("Log-Inclusion Bayes Factor", side = 2, line = 3, cex = 1.5, las = 0)
 mtext("Posterior Mean Interactions ", side = 1, line = 3.7, cex = 1.5, las = 0)


 #------------------------------------------------------------------------------|
 # THE LOCAL MEDIAN PROBABILITY NETWORK
 #------------------------------------------------------------------------------|

 tmp = fit$interactions[lower.tri(fit$interactions)]
 tmp[posterior.inclusion < 0.5] = 0

 median.prob.model = matrix(0, nrow = ncol(Wenchuan), ncol = ncol(Wenchuan))
 median.prob.model[lower.tri(median.prob.model)] = tmp
 median.prob.model = median.prob.model + t(median.prob.model)

 rownames(median.prob.model) = colnames(Wenchuan)
 colnames(median.prob.model) = colnames(Wenchuan)

 library(qgraph)
 qgraph(median.prob.model,
        theme = "TeamFortress",
        maximum = .5,
        fade = FALSE,
        color = c("#f0ae0e"), vsize = 10, repulsion = .9,
        label.cex = 1.1, label.scale = "FALSE",
        labels = colnames(Wenchuan))

 #Restore user par() settings
 par(op)


[Package bgms version 0.1.1 Index]