bgm {bgms} | R Documentation |
Bayesian edge selection or Bayesian estimation for Markov Random Fields of mixed binary and ordinal variables using MCMC.
Description
The function bgm
explores the joint pseudoposterior distribution of
parameters and possibly edge indicators for a Markov Random Field model for
mixed binary and ordinal variables.
Usage
bgm(
x,
variable_type = "ordinal",
reference_category,
iter = 10000,
burnin = 1000,
interaction_scale = 2.5,
threshold_alpha = 0.5,
threshold_beta = 0.5,
edge_selection = TRUE,
edge_prior = c("Bernoulli", "Beta-Bernoulli"),
inclusion_probability = 0.5,
beta_bernoulli_alpha = 1,
beta_bernoulli_beta = 1,
na.action = c("listwise", "impute"),
save = FALSE,
display_progress = TRUE
)
Arguments
x |
A data frame or matrix with |
variable_type |
What kind of variables are there in |
reference_category |
The reference category in the Blume-Capel model.
Should be an integer within the range of integer scores observed for the
“blume-capel” variable. Can be a single number specifying the reference
category for all Blume-Capel variables at once, or a vector of length
|
iter |
How many iterations should the Gibbs sampler run? The default of
|
burnin |
The number of iterations of the Gibbs sampler before saving its output. Since it may take some time for the Gibbs sampler to converge to the posterior distribution, it is recommended not to set this number too low. |
interaction_scale |
The scale of the Cauchy distribution that is used as a
prior for the pairwise interaction parameters. Defaults to |
threshold_alpha , threshold_beta |
The shape parameters of the beta-prime
prior density for the threshold parameters. Must be positive values. If the
two values are equal, the prior density is symmetric about zero. If
|
edge_selection |
Should the function perform Bayesian edge selection on
the edges of the MRF in addition to estimating its parameters
( |
edge_prior |
The inclusion or exclusion of individual edges in the
network is modeled with binary indicator variables that capture the structure
of the network. The argument |
inclusion_probability |
The prior edge inclusion probability for the
Bernoulli model. Can be a single probability, or a matrix of |
beta_bernoulli_alpha , beta_bernoulli_beta |
The two shape parameters of
the Beta prior density for the Bernoulli inclusion probability. Must be
positive numbers. Defaults to |
na.action |
How do you want the function to handle missing data? If
|
save |
Should the function collect and return all samples from the Gibbs
sampler ( |
display_progress |
Should the function show a progress bar
( |
Details
Currently, bgm supports two types of ordinal variables. The regular, default, ordinal variable type has no restrictions on its distribution. Every response category except the first receives its own threshold parameter. The Blume-Capel ordinal variable assumes that there is a specific reference category, such as the “neutral” in a Likert scale, and responses are scored in terms of their distance to this reference category. Specifically, the Blume-Capel model specifies the following quadratic model for the threshold parameters:
\mu_{\text{c}} = \alpha \times \text{c} + \beta \times (\text{c} - \text{r})^2,
where \mu_{\text{c}}
is the threshold for category c.
The parameter \alpha
models a linear trend across categories,
such that \alpha > 0
leads to an increasing number of
observations in higher response categories and \alpha <0
leads to a decreasing number of observations in higher response categories.
The parameter \beta
models the response style in terms of an
offset with respect to the reference category r
; if \beta<0
there is a preference to respond in the reference category (i.e., the model
introduces a penalty for responding in a category further away from the
reference_category category r
), while if \beta > 0
there is preference to score in the extreme categories further away from the
reference_category category.
The Bayesian estimation procedure (edge_selection = FALSE
) simply
estimates the threshold and pairwise interaction parameters of the ordinal
MRF, while the Bayesian edge selection procedure
(edge_selection = TRUE
) also models the probability that individual
edges should be included or excluded from the model. Bayesian edge selection
imposes a discrete spike and slab prior distribution on the pairwise
interactions. By formulating it as a mixture of mutually singular
distributions, the function can use a combination of Metropolis-Hastings and
Gibbs sampling to create a Markov chain that has the joint posterior
distribution as an invariant. The current option for the slab distribution is
a Cauchy with an optional scaling parameter. The slab distribution is also used
as the prior for the interaction parameters for Bayesian estimation. A
beta-prime distribution is used for the exponent of the category parameters.
For Bayesian edge selection, two prior distributions are implemented for the
edge inclusion variables (i.e., the prior probability that an edge is
included); the Bernoulli prior and the Beta-Bernoulli prior.
Value
If save = FALSE
(the default), the result is a list of class
“bgms” containing the following matrices:
-
gamma
: A matrix withp
rows andp
columns, containing posterior inclusion probabilities of individual edges. -
interactions
: A matrix withp
rows andp
columns, containing model-averaged posterior means of the pairwise associations. -
thresholds
: A matrix withp
rows andmax(m)
columns, containing model-averaged category thresholds. In the case of “blume-capel” variables, the first entry is the parameter for the linear effect and the second entry is the parameter for the quadratic effect, which models the offset to the reference category.
If save = TRUE
, the result is a list of class “bgms” containing:
-
gamma
: A matrix withiter
rows andp * (p - 1) / 2
columns, containing the edge inclusion indicators from every iteration of the Gibbs sampler. -
interactions
: A matrix withiter
rows andp * (p - 1) / 2
columns, containing parameter states from every iteration of the Gibbs sampler for the pairwise associations. -
thresholds
: A matrix withiter
rows andsum(m)
columns, containing parameter states from every iteration of the Gibbs sampler for the category thresholds.
Column averages of these matrices provide the model-averaged posterior means.
In addition to the analysis results, the bgm output lists some of the arguments of its call. This is useful for post-processing the results.
Examples
#Store user par() settings
op <- par(no.readonly = TRUE)
##Analyse the Wenchuan dataset
# Here, we use 1e4 iterations, for an actual analysis please use at least
# 1e5 iterations.
fit = bgm(x = Wenchuan)
#------------------------------------------------------------------------------|
# INCLUSION - EDGE WEIGHT PLOT
#------------------------------------------------------------------------------|
par(mar = c(6, 5, 1, 1))
plot(x = fit$interactions[lower.tri(fit$interactions)],
y = fit$gamma[lower.tri(fit$gamma)], ylim = c(0, 1),
xlab = "", ylab = "", axes = FALSE, pch = 21, bg = "gray", cex = 1.3)
abline(h = 0, lty = 2, col = "gray")
abline(h = 1, lty = 2, col = "gray")
abline(h = .5, lty = 2, col = "gray")
mtext("Posterior Mode Edge Weight", side = 1, line = 3, cex = 1.7)
mtext("Posterior Inclusion Probability", side = 2, line = 3, cex = 1.7)
axis(1)
axis(2, las = 1)
#------------------------------------------------------------------------------|
# EVIDENCE - EDGE WEIGHT PLOT
#------------------------------------------------------------------------------|
#For the default choice of the structure prior, the prior odds equal one:
prior.odds = 1
posterior.inclusion = fit$gamma[lower.tri(fit$gamma)]
posterior.odds = posterior.inclusion / (1 - posterior.inclusion)
log.bayesfactor = log(posterior.odds / prior.odds)
log.bayesfactor[log.bayesfactor > 5] = 5
par(mar = c(5, 5, 1, 1) + 0.1)
plot(fit$interactions[lower.tri(fit$interactions)], log.bayesfactor, pch = 21, bg = "#bfbfbf",
cex = 1.3, axes = FALSE, xlab = "", ylab = "", ylim = c(-5, 5.5),
xlim = c(-0.5, 1.5))
axis(1)
axis(2, las = 1)
abline(h = log(1/10), lwd = 2, col = "#bfbfbf")
abline(h = log(10), lwd = 2, col = "#bfbfbf")
text(x = 1, y = log(1 / 10), labels = "Evidence for Exclusion", pos = 1,
cex = 1.7)
text(x = 1, y = log(10), labels = "Evidence for Inclusion", pos = 3, cex = 1.7)
text(x = 1, y = 0, labels = "Absence of Evidence", cex = 1.7)
mtext("Log-Inclusion Bayes Factor", side = 2, line = 3, cex = 1.5, las = 0)
mtext("Posterior Mean Interactions ", side = 1, line = 3.7, cex = 1.5, las = 0)
#------------------------------------------------------------------------------|
# THE MEDIAN PROBABILITY NETWORK
#------------------------------------------------------------------------------|
tmp = fit$interactions[lower.tri(fit$interactions)]
tmp[posterior.inclusion < 0.5] = 0
median.prob.model = matrix(0, nrow = ncol(Wenchuan), ncol = ncol(Wenchuan))
median.prob.model[lower.tri(median.prob.model)] = tmp
median.prob.model = median.prob.model + t(median.prob.model)
rownames(median.prob.model) = colnames(Wenchuan)
colnames(median.prob.model) = colnames(Wenchuan)
library(qgraph)
qgraph(median.prob.model,
theme = "TeamFortress",
maximum = .5,
fade = FALSE,
color = c("#f0ae0e"), vsize = 10, repulsion = .9,
label.cex = 1.1, label.scale = "FALSE",
labels = colnames(Wenchuan))
#Restore user par() settings
par(op)