VariationalBayes {LaplacesDemon} | R Documentation |
Variational Bayes
Description
The VariationalBayes
function is a numerical approximation
method for deterministically estimating the marginal posterior
distributions, target distributions, in a Bayesian model with
approximated distributions by minimizing the Kullback-Leibler
Divergence (KLD
) between the target and its
approximation.
Usage
VariationalBayes(Model, parm, Data, Covar=NULL, Interval=1.0E-6,
Iterations=1000, Method="Salimans2", Samples=1000, sir=TRUE,
Stop.Tolerance=1.0E-5, CPUs=1, Type="PSOCK")
Arguments
Model |
This required argument receives the model from a
user-defined function. The user-defined function is where the model
is specified. |
parm |
This argument requires a vector of initial values equal in
length to the number of parameters. |
Data |
This required argument accepts a list of data. The list of
data must include |
Covar |
This argument defaults to |
Interval |
This argument receives an interval for estimating approximate gradients. The logarithm of the unnormalized joint posterior density of the Bayesian model is evaluated at the current parameter value, and again at the current parameter value plus this interval. |
Iterations |
This argument accepts an integer that determines the
number of iterations that |
Method |
This optional argument currently accepts only
|
Samples |
This argument indicates the number of posterior samples
to be taken with sampling importance resampling via the
|
sir |
This logical argument indicates whether or not Sampling
Importance Resampling (SIR) is conducted via the |
Stop.Tolerance |
This argument accepts any positive number and
defaults to 1.0E-3. Tolerance is calculated each iteration, and the
criteria varies by algorithm. The algorithm is considered to have
converged to the user-specified |
CPUs |
This argument accepts an integer that specifies the number
of central processing units (CPUs) of the multicore computer or
computer cluster. This argument defaults to |
Type |
This argument specifies the type of parallel processing to
perform, accepting either |
Details
Variational Bayes (VB) is a family of numerical approximation algorithms that is a subset of variational inference algorithms, or variational methods. Some examples of variational methods include the mean-field approximation, loopy belief propagation, tree-reweighted belief propagation, and expectation propagation (EP).
Variational inference for probabilistic models was introduced in the field of machine learning, influenced by statistical physics literature (Saul et al., 1996; Saul and Jordan, 1996; Jaakkola, 1997). The mean-field methods in Neal and Hinton (1999) led to variational algorithms.
Variational inference algorithms were later generalized for conjugate exponential-family models (Attias, 1999, 2000; Wiegerinck, 2000; Ghahramani and Beal, 2001; Xing et al., 2003). These algorithms still require different designs for different model forms. Salimans and Knowles (2013) introduced general-purpose VB algorithms for Gaussian posteriors.
A VB algorithm deterministically estimates the marginal posterior
distributions (target distributions) in a Bayesian model with
approximated distributions by minimizing the Kullback-Leibler
Divergence (KLD
) between the target and its
approximation. The complicated posterior distribution is approximated
with a simpler distribution. The simpler, approximated distribution is
called the variational approximation, or approximation distribution,
of the posterior. The term variational is derived from the calculus of
variations, and regards optimization algorithms that select the best
function (which is a distribution in VB), rather than merely selecting
the best parameters.
VB algorithms often use Gaussian distributions as approximating distributions. In this case, both the mean and variance of the parameters are estimated.
Usually, a VB algorithm is slower to convergence than a Laplace Approximation algorithm, and faster to convergence than a Monte Carlo algorithm such as Markov chain Monte Carlo (MCMC). VB often provides solutions with comparable accuracy to MCMC in less time. Though Monte Carlo algorithms provide a numerical approximation to the exact posterior using a set of samples, VB provides a locally-optimal, exact analytical solution to an approximation of the posterior. VB is often more applicable than MCMC to big data or large-dimensional models.
Since VB is deterministic, it is asymptotic and subject to the same limitations with respect to sample size as Laplace Approximation. However, VB estimates more parameters than Laplace Approximation, such as when Laplace Approximation optimizes the posterior mode of a Gaussian distribution, while VB optimizes both the Gaussian mean and variance.
Traditionally, VB algorithms required customized equations. The
VariationalBayes
function uses general-purpose algorithms. A
general-purpose VB algorithm is less efficient than an algorithm
custom designed for the model form. However, a general-purpose
algorithm is applied consistently and easily to numerous model forms.
When Method="Salimans2"
, the second algorithm of Salimans and
Knowles (2013) is used. This requires the gradient and Hessian, which
is more efficient with a small number of parameters as long as the
posterior is twice differentiable. The step size is constant. This
algorithm is suitable for marginal posterior distributions that are
Gaussian and unimodal. A stochastic approximation algorithm is used
in the context of fixed-form VB, inspired by considering fixed-form VB
to be equivalent to performing a linear regression with the sufficient
statistics of the approximation as independent variables and the
unnormalized logarithm of the joint posterior density as the dependent
variable. The number of requested iterations should be large, since the
step-size decreases for larger requested iterations, and a small
step-size will eventually converge. A large number of requested
iterations results in a smaller step-size and better convergence
properties, so hope for early convergence. However convergence is
checked only in the last half of the iterations after the algorithm
begins to average the mean and variance from the samples of the
stochastic approximation. The history of stochastic samples is
returned.
Value
VariationalBayes
returns an object of class vb
that is a list with the following components:
Call |
This is the matched call of |
Converged |
This is a logical indicator of whether or not
|
Covar |
This is the estimated covariance matrix. The
|
Deviance |
This is a vector of the iterative history of the
deviance in the |
History |
This is an array of the iterative history of the
parameters in the |
Initial.Values |
This is the vector of initial values that was
originally given to |
LML |
This is an approximation of the logarithm of the marginal
likelihood of the data (see the |
LP.Final |
This reports the final scalar value for the logarithm of the unnormalized joint posterior density. |
LP.Initial |
This reports the initial scalar value for the logarithm of the unnormalized joint posterior density. |
Minutes |
This is the number of minutes that
|
Monitor |
When |
Posterior |
When |
Step.Size.Final |
This is the final, scalar |
Step.Size.Initial |
This is the initial, scalar |
Summary1 |
This is a summary matrix that summarizes the point-estimated posterior means and variances. Uncertainty around the posterior means is estimated from the estimated covariance matrix. Rows are parameters. The following columns are included: Mean, SD (Standard Deviation), LB (Lower Bound), and UB (Upper Bound). The bounds constitute a 95% probability interval. |
Summary2 |
This is a summary matrix that summarizes the
posterior samples drawn with sampling importance resampling
( |
Tolerance.Final |
This is the last |
Tolerance.Stop |
This is the |
Author(s)
Statisticat, LLC software@bayesian-inference.com
References
Attias, H. (1999). "Inferring Parameters and Structure of Latent Variable Models by Variational Bayes". In Uncertainty in Artificial Intelligence.
Attias, H. (2000). "A Variational Bayesian Framework for Graphical Models". In Neural Information Processing Systems.
Ghahramani, Z. and Beal, M. (2001). "Propagation Algorithms for Variational Bayesian Learning". In Neural Information Processing Systems, p. 507–513.
Jaakkola, T. (1997). "Variational Methods for Inference and Estimation in Graphical Models". PhD thesis, Massachusetts Institute of Technology.
Salimans, T. and Knowles, D.A. (2013). "Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression". Bayesian Analysis, 8(4), p. 837–882.
Neal, R. and Hinton, G. (1999). "A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants". In Learning in Graphical Models, p. 355–368. MIT Press, 1999.
Saul, L. and Jordan, M. (1996). "Exploiting Tractable Substructures in Intractable Networks". Neural Information Processing Systems.
Saul, L., Jaakkola, T., and Jordan, M. (1996). "Mean Field Theory for Sigmoid Belief Networks". Journal of Artificial Intelligence Research, 4, p. 61–76.
Wiegerinck, W. (2000). "Variational Approximations Between Mean Field Theory and the Junction Tree Algorithm". In Uncertainty in Artificial Intelligence.
Xing, E., Jordan, M., and Russell, S. (2003). "A Generalized Mean Field Algorithm for Variational Inference in Exponential Families". In Uncertainty in Artificial Intelligence.
See Also
BayesFactor
,
IterativeQuadrature
,
LaplaceApproximation
,
LaplacesDemon
,
GIV
,
LML
,
PMC
, and
SIR
.
Examples
# The accompanying Examples vignette is a compendium of examples.
#################### Load the LaplacesDemon Library #####################
library(LaplacesDemon)
############################## Demon Data ###############################
data(demonsnacks)
y <- log(demonsnacks$Calories)
X <- cbind(1, as.matrix(log(demonsnacks[,10]+1)))
J <- ncol(X)
for (j in 2:J) X[,j] <- CenterScale(X[,j])
######################### Data List Preparation #########################
mon.names <- "mu[1]"
parm.names <- as.parm.names(list(beta=rep(0,J), sigma=0))
pos.beta <- grep("beta", parm.names)
pos.sigma <- grep("sigma", parm.names)
PGF <- function(Data) {
beta <- rnorm(Data$J)
sigma <- runif(1)
return(c(beta, sigma))
}
MyData <- list(J=J, PGF=PGF, X=X, mon.names=mon.names,
parm.names=parm.names, pos.beta=pos.beta, pos.sigma=pos.sigma, y=y)
########################## Model Specification ##########################
Model <- function(parm, Data)
{
### Parameters
beta <- parm[Data$pos.beta]
sigma <- interval(parm[Data$pos.sigma], 1e-100, Inf)
parm[Data$pos.sigma] <- sigma
### Log-Prior
beta.prior <- sum(dnormv(beta, 0, 1000, log=TRUE))
sigma.prior <- dhalfcauchy(sigma, 25, log=TRUE)
### Log-Likelihood
mu <- tcrossprod(Data$X, t(beta))
LL <- sum(dnorm(Data$y, mu, sigma, log=TRUE))
### Log-Posterior
LP <- LL + beta.prior + sigma.prior
Modelout <- list(LP=LP, Dev=-2*LL, Monitor=mu[1],
yhat=rnorm(length(mu), mu, sigma), parm=parm)
return(Modelout)
}
############################ Initial Values #############################
#Initial.Values <- GIV(Model, MyData, PGF=TRUE)
Initial.Values <- rep(0,J+1)
#Fit <- VariationalBayes(Model, Initial.Values, Data=MyData, Covar=NULL,
# Iterations=1000, Method="Salimans2", Stop.Tolerance=1e-3, CPUs=1)
#Fit
#print(Fit)
#PosteriorChecks(Fit)
#caterpillar.plot(Fit, Parms="beta")
#plot(Fit, MyData, PDF=FALSE)
#Pred <- predict(Fit, Model, MyData, CPUs=1)
#summary(Pred, Discrep="Chi-Square")
#plot(Pred, Style="Covariates", Data=MyData)
#plot(Pred, Style="Density", Rows=1:9)
#plot(Pred, Style="Fitted")
#plot(Pred, Style="Jarque-Bera")
#plot(Pred, Style="Predictive Quantiles")
#plot(Pred, Style="Residual Density")
#plot(Pred, Style="Residuals")
#Levene.Test(Pred)
#Importance(Fit, Model, MyData, Discrep="Chi-Square")
#Fit$Covar is scaled (2.38^2/d) and submitted to LaplacesDemon as Covar.
#Fit$Summary[,1] is submitted to LaplacesDemon as Initial.Values.
#End