IterativeQuadrature {LaplacesDemon} | R Documentation |
Iterative Quadrature
Description
The IterativeQuadrature
function iteratively approximates the
first two moments of marginal posterior distributions of a Bayesian
model with deterministic integration.
Usage
IterativeQuadrature(Model, parm, Data, Covar=NULL, Iterations=100,
Algorithm="CAGH", Specs=NULL, Samples=1000, sir=TRUE,
Stop.Tolerance=c(1e-5,1e-15), CPUs=1, Type="PSOCK")
Arguments
Model |
This required argument receives the model from a
user-defined function. The user-defined function is where the model
is specified. |
parm |
This argument requires a vector of initial values equal in
length to the number of parameters. |
Data |
This required argument accepts a list of data. The list of
data must include |
Covar |
This argument accepts a |
Iterations |
This argument accepts an integer that determines the
number of iterations that |
Algorithm |
This optional argument accepts a quoted string that
specifies the iterative quadrature algorithm. The default
method is |
Specs |
This argument accepts a list of specifications for an algorithm. |
Samples |
This argument indicates the number of posterior samples
to be taken with sampling importance resampling via the
|
sir |
This logical argument indicates whether or not Sampling
Importance Resampling (SIR) is conducted via the |
Stop.Tolerance |
This argument accepts a vector of two positive
numbers, and defaults to |
CPUs |
This argument accepts an integer that specifies the number
of central processing units (CPUs) of the multicore computer or
computer cluster. This argument defaults to |
Type |
This argument specifies the type of parallel processing to
perform, accepting either |
Details
Quadrature is a historical term in mathematics that means determining area. Mathematicians of ancient Greece, according to the Pythagorean doctrine, understood determination of area of a figure as the process of geometrically constructing a square having the same area (squaring). Thus the name quadrature for this process.
In medieval Europe, quadrature meant the calculation of area by any method. With the invention of integral calculus, quadrature has been applied to the computation of a univariate definite integral. Numerical integration is a broad family of algorithms for calculating the numerical value of a definite integral. Numerical quadrature is a synonym for quadrature applied to one-dimensional integrals. Multivariate quadrature, also called cubature, is the application of quadrature to multidimensional integrals.
A quadrature rule is an approximation of the definite integral of a function, usually stated as a weighted sum of function values at specified points within the domain of integration. The specified points are referred to as abscissae, abscissas, integration points, or nodes, and have associated weights. The calculation of the nodes and weights of the quadrature rule differs by the type of quadrature. There are numerous types of quadrature algorithms. Bayesian forms of quadrature usually use Gauss-Hermite quadrature (Naylor and Smith, 1982), and placing a Gaussian Process on the function is a common extension (O'Hagan, 1991; Rasmussen and Ghahramani, 2003) that is called ‘Bayesian Quadrature’. Often, these and other forms of quadrature are also referred to as model-based integration.
Gauss-Hermite quadrature uses Hermite polynomials to calculate the
rule. However, there are two versions of Hermite polynomials, which
result in different kernels in different fields. In physics, the
kernel is exp(-x^2)
, while in probability the kernel is
exp(-x^2/2)
. The weights are a normal density. If the
parameters of the normal distribution, \mu
and
\sigma^2
, are estimated from data, then it is referred
to as adaptive Gauss-Hermite quadrature, and the parameters are the
conditional mean and conditional variance. Outside of Gauss-Hermite
quadrature, adaptive quadrature implies that a difficult range in the
integrand is subdivided with more points until it is
well-approximated. Gauss-Hermite quadrature performs well when the
integrand is smooth, and assumes normality or multivariate normality.
Adaptive Gauss-Hermite quadrature has been demonstrated to outperform
Gauss-Hermite quadrature in speed and accuracy.
A goal in quadrature is to minimize integration error, which is the error between the evaluations and the weights of the rule. Therefore, a goal in Bayesian Gauss-Hermite quadrature is to minimize integration error while approximating a marginal posterior distribution that is assumed to be smooth and normally-distributed. This minimization often occurs by increasing the number of nodes until a change in mean integration error is below a tolerance, rather than minimizing integration error itself, since the target may be only approximately normally distributed, or minimizing the sum of integration error, which would change with the number of nodes.
To approximate integrals in multiple dimensions, one approach applies
N
nodes of a univariate quadrature rule to multiple dimensions
(using the GaussHermiteCubeRule
function for example)
via the product rule, which results in many more multivariate nodes.
This requires the number of function evaluations to grow exponentially
as dimension increases. Multidimensional quadrature is usually limited
to less than ten dimensions, both due to the number of nodes required,
and because the accuracy of multidimensional quadrature algorithms
decreases as the dimension increases. Three methods may overcome this
curse of dimensionality in varying degrees: componentwise quadrature,
sparse grids, and Monte Carlo.
Componentwise quadrature is the iterative application of univariate quadrature to each parameter. It is applicable with high-dimensional models, but sacrifices the ability to calculate the conditional covariance matrix, and calculates only the variance of each parameter.
Sparse grids were originally developed by Smolyak for multidimensional quadrature. A sparse grid is based on a one-dimensional quadrature rule. Only a subset of the nodes from the product rule is included, and the weights are appropriately rescaled. Although a sparse grid is more efficient because it reduces the number of nodes to achieve the same accuracy, the user must contend with increasing the accuracy of the grid, and it remains inapplicable to high-dimensional integrals.
Monte Carlo is a large family of sampling-based algorithms. O'Hagan
(1987) asserts that Monte Carlo is frequentist, inefficient, regards
irrelevant information, and disregards relevant information.
Quadrature, he maintains (O'Hagan, 1992), is the most Bayesian
approach, and also the most efficient. In high dimensions, he
concedes, a popular subset of Monte Carlo algorithms is currently the
best for cheap model function evaluations. These algorithms are called
Markov chain Monte Carlo (MCMC). High-dimensional models with
expensive model evaluation functions, however, are not well-suited to
MCMC. A large number of MCMC algorithms is available in the
LaplacesDemon
function.
Following are some reasons to consider iterative quadrature rather than MCMC. Once an MCMC sampler finds equilibrium, it must then draw enough samples to represent all targets. Iterative quadrature does not need to continue drawing samples. Multivariate quadrature is consistently reported as more efficient than MCMC when its assumptions hold, though multivariate quadrature is limited to small dimensions. High-dimensional models therefore default to MCMC, between the two. Componentwise quadrature algorithms like CAGH, however, may also be more efficient with cloc-time than MCMC in high dimensions, especially against componentwise MCMC algorithms. Another reason to consider iterative quadrature are that assessing convergence in MCMC is a difficult topic, but not for iterative quadrature. A user of iterative quadrature does not have to contend with effective sample size and autocorrelation, assessing stationarity, acceptance rates, diminishing adaptation, etc. Stochastic sampling in MCMC is less efficient when samples occur in close proximity (such as when highly autocorrelated), whereas in quadrature the nodes are spread out by design.
In general, the conditional means and conditional variances progress smoothly to the target in multidimensional quadrature. For componentwise quadrature, movement to the target is not smooth, and often resembles a Markov chain or optimization algorithm.
Iterative quadrature is often applied after
LaplaceApproximation
to obtain a more reliable
estimate of parameter variance or covariance than the negative inverse
of the Hessian
matrix of second derivatives, which is
suitable only when the contours of the logarithm of the unnormalized
joint posterior density are approximately ellipsoidal (Naylor and
Smith, 1982, p. 224).
When Algorithm="AGH"
, the Naylor and Smith (1982) algorithm
is used. The AGH algorithm uses multivariate quadrature with the
physicist's (not the probabilist's) kernel.
There are four algorithm specifications: N
is the number of
univariate nodes, Nmax
is the maximum number of univariate
nodes, Packages
accepts any package required for the model
function when parallelized, and Dyn.libs
accepts dynamic
libraries for parallelization, if required. The number of univariate
nodes begins at N
and increases by one each iteration. The
number of multivariate nodes grows quickly with N
. Naylor and
Smith (1982) recommend beginning with as few nodes as N=3
. Any
of the following events will cause N
to increase by 1 when
N
is less than Nmax
:
All LP weights are zero (and non-finite weights are set to zero)
-
\mu
does not result in an increase in LP All elements in
\Sigma
are not finiteThe square root of the sum of the squared changes in
\mu
is less than or equal to theStop.Tolerance
Tolerance includes two metrics: change in mean integration error and change in parameters. Including the change in parameters for tolerance was not mentioned in Naylor and Smith (1982).
Naylor and Smith (1982) consider a transformation due to correlation. This is not included here.
The AGH algorithm does not currently handle constrained parameters,
such as with the interval
function. If a parameter is
constrained and changes during a model evaluation, this changes the
node and the multivariate weight. This is currently not corrected.
An advantage of AGH over componentwise adaptive quadrature is that AGH estimates covariance, where a componentwise algorithm ignores it. A disadvantage of AGH over a componentwise algorithm is that the number of nodes increases so quickly with dimension, that AGH is limited to small-dimensional models.
When Algorithm="AGHSG"
, the Naylor and Smith (1982) algorithm
is applied to a sparse grid, rather than a traditional multivariate
quadrature rule. This is identical to the AGH algorithm above, except
that a sparse grid replaces the multivariate quadrature rule.
The sparse grid reduces the number of nodes. The cost of reducing the
number of nodes is that the user must consider the accuracy, K
.
There are four algorithm specifications: K
is the accuracy (as a
positive integer), Kmax
is the maximum accuracy,
Packages
accepts any package required for the model function
when parallelized, and Dyn.libs
accepts dynamic libraries for
parallelization, if required. These arguments represent accuracy
rather than the number of univariate nodes, but otherwise are similar
to the AGH algorithm.
When Algorithm="CAGH"
, a componentwise version of the adaptive
Gauss-Hermite quadrature of Naylor and Smith (1982) is used. Each
iteration, each marginal posterior distribution is approximated
sequentially, in a random order, with univariate quadrature. The
conditional mean and conditional variance are also approximated each
iteration, making it an adaptive algorithm.
There are four algorithm specifications: N
is the number of
nodes, Nmax
is the maximum number of nodes, Packages
accepts any package required for the model function when parallelized,
and Dyn.libs
accepts dynamic libraries for parallelization, if
required. The number of nodes begins at N
. All parameters have
the same number of nodes. Any of the following events will cause
N
to increase by 1 when N
is less than Nmax
, and
these conditions refer to all parameters (not individually):
Any LP weights are not finite
All LP weights are zero
-
\mu
does not result in an increase in LP The square root of the sum of the squared changes in
\mu
is less than or equal to theStop.Tolerance
It is recommended to begin with N=3
and set Nmax
between
10 and 100. As long as CAGH does not experience problematic weights,
and as long as CAGH is improving LP with \mu
, the number of nodes
does not increase. When CAGH becomes either universally problematic or
universally stable, then N
slowly increases until the sum of
both the mean integration error and the sum of the squared changes in
\mu
is less than the Stop.Tolerance
for two consecutive
iterations.
If the highest LP occurs at the lowest or highest node, then the value at that node becomes the conditional mean, rather than calculating it from all weighted samples; this facilitates movement when the current integral is poorly centered toward a well-centered integral. If all weights are zero, then a random proposal is generated with a small variance.
Tolerance includes two metrics: change in mean integration error and change in parameters, as the square root of the sum of the squared differences.
When a parameter constraint is encountered, the node and weight of the quadrature rule is recalculated.
An advantage of CAGH over multidimensional adaptive quadrature is that CAGH may be applied in large dimensions. Disadvantages of CAGH are that only variance, not covariance, is estimated, and ignoring covariance may be problematic.
Value
IterativeQuadrature
returns an object of class iterquad
that is a list with the following components:
Algorithm |
This is the name of the iterative quadrature algorithm. |
Call |
This is the matched call of |
Converged |
This is a logical indicator of whether or not
|
Covar |
This is the estimated covariance matrix. The |
Deviance |
This is a vector of the iterative history of the
deviance in the |
History |
This is a matrix of the iterative history of the
parameters in the |
Initial.Values |
This is the vector of initial values that was
originally given to |
LML |
This is an approximation of the logarithm of the marginal
likelihood of the data (see the |
LP.Final |
This reports the final scalar value for the logarithm of the unnormalized joint posterior density. |
LP.Initial |
This reports the initial scalar value for the logarithm of the unnormalized joint posterior density. |
LPw |
This is the latest matrix of the logarithm of the unnormalized joint posterior density. It is weighted and normalized so that each column sums to one. |
M |
This is the final |
Minutes |
This is the number of minutes that
|
Monitor |
When |
N |
This is the final number of nodes. |
Posterior |
When |
Summary1 |
This is a summary matrix that summarizes the point-estimated posterior means. Uncertainty around the posterior means is estimated from the covariance matrix. Rows are parameters. The following columns are included: Mean, SD (Standard Deviation), LB (Lower Bound), and UB (Upper Bound). The bounds constitute a 95% probability interval. |
Summary2 |
This is a summary matrix that summarizes the
posterior samples drawn with sampling importance resampling
( |
Tolerance.Final |
This is the last |
Tolerance.Stop |
This is the |
Z |
This is the final |
Author(s)
Statisticat, LLC software@bayesian-inference.com
References
Naylor, J.C. and Smith, A.F.M. (1982). "Applications of a Method for the Efficient Computation of Posterior Distributions". Applied Statistics, 31(3), p. 214–225.
O'Hagan, A. (1987). "Monte Carlo is Fundamentally Unsound". The Statistician, 36, p. 247–249.
O'Hagan, A. (1991). "Bayes-Hermite Quadrature". Journal of Statistical Planning and Inference, 29, p. 245–260.
O'Hagan, A. (1992). "Some Bayesian Numerical Analysis". In Bernardo, J.M., Berger, J.O., David, A.P., and Smith, A.F.M., editors, Bayesian Statistics, 4, p. 356–363, Oxford University Press.
Rasmussen, C.E. and Ghahramani, Z. (2003). "Bayesian Monte Carlo". In Becker, S. and Obermayer, K., editors, Advances in Neural Information Processing Systems, 15, MIT Press, Cambridge, MA.
See Also
GaussHermiteCubeRule
,
GaussHermiteQuadRule
,
GIV
,
Hermite
,
Hessian
,
LaplaceApproximation
,
LaplacesDemon
,
LML
,
PMC
,
SIR
, and
SparseGrid
.
Examples
# The accompanying Examples vignette is a compendium of examples.
#################### Load the LaplacesDemon Library #####################
library(LaplacesDemon)
############################## Demon Data ###############################
data(demonsnacks)
y <- log(demonsnacks$Calories)
X <- cbind(1, as.matrix(log(demonsnacks[,10]+1)))
J <- ncol(X)
for (j in 2:J) X[,j] <- CenterScale(X[,j])
######################### Data List Preparation #########################
mon.names <- "mu[1]"
parm.names <- as.parm.names(list(beta=rep(0,J), sigma=0))
pos.beta <- grep("beta", parm.names)
pos.sigma <- grep("sigma", parm.names)
PGF <- function(Data) {
beta <- rnorm(Data$J)
sigma <- runif(1)
return(c(beta, sigma))
}
MyData <- list(J=J, PGF=PGF, X=X, mon.names=mon.names,
parm.names=parm.names, pos.beta=pos.beta, pos.sigma=pos.sigma, y=y)
########################## Model Specification ##########################
Model <- function(parm, Data)
{
### Parameters
beta <- parm[Data$pos.beta]
sigma <- interval(parm[Data$pos.sigma], 1e-100, Inf)
parm[Data$pos.sigma] <- sigma
### Log-Prior
beta.prior <- sum(dnormv(beta, 0, 1000, log=TRUE))
sigma.prior <- dhalfcauchy(sigma, 25, log=TRUE)
### Log-Likelihood
mu <- tcrossprod(Data$X, t(beta))
LL <- sum(dnorm(Data$y, mu, sigma, log=TRUE))
### Log-Posterior
LP <- LL + beta.prior + sigma.prior
Modelout <- list(LP=LP, Dev=-2*LL, Monitor=mu[1],
yhat=rnorm(length(mu), mu, sigma), parm=parm)
return(Modelout)
}
############################ Initial Values #############################
#Initial.Values <- GIV(Model, MyData, PGF=TRUE)
Initial.Values <- rep(0,J+1)
######################### Adaptive Gauss-Hermite ########################
#Fit <- IterativeQuadrature(Model, Initial.Values, MyData, Covar=NULL,
# Iterations=100, Algorithm="AGH",
# Specs=list(N=5, Nmax=7, Packages=NULL, Dyn.libs=NULL), CPUs=1)
################## Adaptive Gauss-Hermite Sparse Grid ###################
#Fit <- IterativeQuadrature(Model, Initial.Values, MyData, Covar=NULL,
# Iterations=100, Algorithm="AGHSG",
# Specs=list(K=5, Kmax=7, Packages=NULL, Dyn.libs=NULL), CPUs=1)
################# Componentwise Adaptive Gauss-Hermite ##################
#Fit <- IterativeQuadrature(Model, Initial.Values, MyData, Covar=NULL,
# Iterations=100, Algorithm="CAGH",
# Specs=list(N=3, Nmax=10, Packages=NULL, Dyn.libs=NULL), CPUs=1)
#Fit
#print(Fit)
#PosteriorChecks(Fit)
#caterpillar.plot(Fit, Parms="beta")
#plot(Fit, MyData, PDF=FALSE)
#Pred <- predict(Fit, Model, MyData, CPUs=1)
#summary(Pred, Discrep="Chi-Square")
#plot(Pred, Style="Covariates", Data=MyData)
#plot(Pred, Style="Density", Rows=1:9)
#plot(Pred, Style="Fitted")
#plot(Pred, Style="Jarque-Bera")
#plot(Pred, Style="Predictive Quantiles")
#plot(Pred, Style="Residual Density")
#plot(Pred, Style="Residuals")
#Levene.Test(Pred)
#Importance(Fit, Model, MyData, Discrep="Chi-Square")
#End