buildLaplace {nimble}R Documentation

Laplace approximation and adaptive Gauss-Hermite quadrature

Description

Build a Laplace or AGHQ approximation algorithm for a given NIMBLE model.

Usage

buildLaplace(
  model,
  paramNodes,
  randomEffectsNodes,
  calcNodes,
  calcNodesOther,
  control = list()
)

buildAGHQ(
  model,
  nQuad = 1,
  paramNodes,
  randomEffectsNodes,
  calcNodes,
  calcNodesOther,
  control = list()
)

Arguments

model

a NIMBLE model object, such as returned by nimbleModel. The model must have automatic derivatives (AD) turned on, e.g. by using buildDerivs=TRUE in nimbleModel.

paramNodes

a character vector of names of parameter nodes in the model; defaults are provided by setupMargNodes. Alternatively, paramNodes can be a list in the format returned by setupMargNodes, in which case randomEffectsNodes, calcNodes, and calcNodesOther are not needed (and will be ignored).

randomEffectsNodes

a character vector of names of continuous unobserved (latent) nodes to marginalize (integrate) over using Laplace approximation; defaults are provided by setupMargNodes.

calcNodes

a character vector of names of nodes for calculating the integrand for Laplace approximation; defaults are provided by setupMargNodes. There may be deterministic nodes between paramNodes and calcNodes. These will be included in calculations automatically and thus do not need to be included in calcNodes (but there is no problem if they are).

calcNodesOther

a character vector of names of nodes for calculating terms in the log-likelihood that do not depend on any randomEffectsNodes, and thus are not part of the marginalization, but should be included for purposes of finding the MLE. This defaults to stochastic nodes that depend on paramNodes but are not part of and do not depend on randomEffectsNodes. There may be deterministic nodes between paramNodes and calcNodesOther. These will be included in calculations automatically and thus do not need to be included in calcNodesOther (but there is no problem if they are).

control

a named list for providing additional settings used in Laplace approximation. See control section below. Most of these can be updated later with the 'updateSettings' method.

nQuad

number of quadrature points for AGHQ (in one dimension). Laplace approximation is AGHQ with 'nQuad=1'. Only odd numbers of nodes really make sense. Often only one or a few nodes can achieve high accuracy. A maximum of 35 nodes is supported. Note that for multivariate quadratures, the number of nodes will be (number of dimensions)^nQuad.

buildLaplace

buildLaplace creates an object that can run Laplace approximation and for a given model or part of a model. buildAGHQ creates an object that can run adaptive Gauss-Hermite quadrature (AGHQ, sometimes called "adaptive Gaussian quadrature") for a given model or part of a model. Laplace approximation is AGHQ with one quadrature point, hence 'buildLaplace' simply calls 'buildAGHQ' with 'nQuad=1'. These methods approximate the integration over continuous random effects in a hierarchical model to calculate the (marginal) likelihood.

buildAGHQ and buildLaplace will by default (unless changed manually via 'control$split') determine from the model which random effects can be integrated over (marginalized) independently. For example, in a GLMM with a grouping factor and an independent random effect intercept for each group, the random effects can be marginalized as a set of univariate approximations rather than one multivariate approximation. On the other hand, correlated or nested random effects would require multivariate marginalization.

Maximum likelihood estimation is available for Laplace approximation ('nQuad=1') with univariate or multivariate integrations. With 'nQuad > 1', maximum likelihood estimation is available only if all integrations are univariate (e.g., a set of univariate random effects). If there are multivariate integrations, these can be calculated at chosen input parameters but not maximized over parameters. For example, one can find the MLE based on Laplace approximation and then increase 'nQuad' (using the 'updateSettings' method below) to check on accuracy of the marginal log likelihood at the MLE.

Beware that quadrature will use 'nQuad^k' quadrature points, where 'k' is the dimension of each integration. Therefore quadrature for 'k' greater that 2 or 3 can be slow. As just noted, 'buildAGHQ' will determine independent dimensions of quadrature, so it is fine to have a set of univariate random effects, as these will each have k=1. Multivariate quadrature (k>1) is only necessary for nested, correlated, or otherwise dependent random effects.

The recommended way to find the maximum likelihood estimate and associated outputs is by calling runLaplace or runAGHQ. The input should be the compiled Laplace or AGHQ algorithm object. This would be produced by running compileNimble with input that is the result of buildLaplace or buildAGHQ.

For more granular control, see below for methods findMLE and summary. See function summaryLaplace for an easier way to call the summary method and obtain results that include node names. These steps are all done within runLaplace and runAGHQ.

The NIMBLE User Manual at r-nimble.org also contains an example of Laplace approximation.

How input nodes are processed

buildLaplace and buildAGHQ make good tries at deciding what to do with the input model and any (optional) of the node arguments. However, random effects (over which approximate integration will be done) can be written in models in multiple equivalent ways, and customized use cases may call for integrating over chosen parts of a model. Hence, one can take full charge of how different parts of the model will be used.

Any of the input node vectors, when provided, will be processed using nodes <- model$expandNodeNames(nodes), where nodes may be paramNodes, randomEffectsNodes, and so on. This step allows any of the inputs to include node-name-like syntax that might contain multiple nodes. For example, paramNodes = 'beta[1:10]' can be provided if there are actually 10 scalar parameters, 'beta[1]' through 'beta[10]'. The actual node names in the model will be determined by the exapndNodeNames step.

In many (but not all) cases, one only needs to provide a NIMBLE model object and then the function will construct reasonable defaults necessary for Laplace approximation to marginalize over all continuous latent states (aka random effects) in a model. The default values for the four groups of nodes are obtained by calling setupMargNodes, whose arguments match those here (except for a few arguments which are taken from control list elements here).

setupMargNodes tries to give sensible defaults from any combination of paramNodes, randomEffectsNodes, calcNodes, and calcNodesOther that are provided. For example, if you provide only randomEffectsNodes (perhaps you want to marginalize over only some of the random effects in your model), setupMargNodes will try to determine appropriate choices for the others.

setupMargNodes also determines which integration dimensions are conditionally independent, i.e., which can be done separately from each other. For example, when possible, 10 univariate random effects will be split into 10 univariate integration problems rather than one 10-dimensional integration problem.

The defaults make general assumptions such as that randomEffectsNodes have paramNodes as parents. However, The steps for determining defaults are not simple, and it is possible that they will be refined in the future. It is also possible that they simply don't give what you want for a particular model. One example where they will not give desired results can occur when random effects have no prior parameters, such as 'N(0,1)' nodes that will be multiplied by a scale factor (e.g. sigma) and added to other explanatory terms in a model. Such nodes look like top-level parameters in terms of model structure, so you must provide a randomEffectsNodes argument to indicate which they are.

It can be helpful to call setupMargNodes directly to see exactly how nodes will be arranged for Laplace approximation. For example, you may want to verify the choice of randomEffectsNodes or get the order of parameters it has established to use for making sense of the MLE and results from the summary method. One can also call setupMargNodes, customize the returned list, and then provide that to buildLaplace as paramNodes. In that case, setupMargNodes will not be called (again) by buildLaplace.

If setupMargNodes is emitting an unnecessary warning, simply use control=list(check=FALSE).

Managing parameter transformations that may be used internally

If any paramNodes (parameters) or randomEffectsNodes (random effects / latent states) have constraints on the range of valid values (because of the distribution they follow), they will be used on a transformed scale determined by parameterTransform. This means the Laplace approximation itself will be done on the transformed scale for random effects and finding the MLE will be done on the transformed scale for parameters. For parameters, prior distributions are not included in calculations, but they are used to determine valid parameter ranges and hence to set up any transformations. For example, if sigma is a standard deviation, you can declare it with a prior such as sigma ~ dhalfflat() to indicate that it must be greater than 0.

For default determination of when transformations are needed, all parameters must have a prior distribution simply to indicate the range of valid values. For a param p that has no constraint, a simple choice is p ~ dflat().

Understanding inner and outer optimizations

Note that there are two numerical optimizations when finding maximum likelihood estimates with a Laplace or (1D) AGHQ algorithm: (1) maximizing the joint log-likelihood of random effects and data given a parameter value to construct the approximation to the marginal log-likelihood at the given parameter value; (2) maximizing the approximation to the marginal log-likelihood over the parameters. In what follows, the prefix 'inner' refers to optimization (1) and 'outer' refers to optimization (2). Currently both optimizations default to using method "BFGS". However, one can use other optimizers or simply run optimization (2) manually from R; see the example below. In some problems, choice of inner and/or outer optimizer can make a big difference for obtaining accurate results, especially for standard errors. Hence it is worth experimenting if one is in doubt.

control list arguments

The control list allows additional settings to be made using named elements of the list. Most (or all) of these can be updated later using the 'updateSettings' method. Supported elements include:

# end itemize

Available methods

The object returned by buildLaplace is a nimbleFunction object with numerous methods (functions). Here these are described in three tiers of user relevance.

Most useful methods

The most relevant methods to a user are:

Methods for more advanced uses

Additional methods to access or control more details of the Laplace approximation include:

Internal or development methods

Some methods are included for calculating the (approximate) marginal log posterior density by including the prior distribution of the parameters. This is useful for finding the maximum a posteriori probability (MAP) estimate. Currently these are provided for point calculations without estimation methods.

Finally, methods that are primarily for internal use by other methods include:

Author(s)

Wei Zhang, Perry de Valpine, Paul van Dam-Bates

References

Kass, R. and Steffey, D. (1989). Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). Journal of the American Statistical Association, 84(407), 717-726.

Liu, Q. and Pierce, D. A. (1994). A Note on Gauss-Hermite Quadrature. Biometrika, 81(3) 624-629.

Jackel, P. (2005). A note on multivariate Gauss-Hermite quadrature. London: ABN-Amro. Re.

Skaug, H. and Fournier, D. (2006). Automatic approximation of the marginal likelihood in non-Gaussian hierarchical models. Computational Statistics & Data Analysis, 56, 699-709.

Examples

pumpCode <- nimbleCode({ 
  for (i in 1:N){
    theta[i] ~ dgamma(alpha, beta)
    lambda[i] <- theta[i] * t[i]
    x[i] ~ dpois(lambda[i])
  }
  alpha ~ dexp(1.0)
  beta ~ dgamma(0.1, 1.0)
})
pumpConsts <- list(N = 10, t = c(94.3, 15.7, 62.9, 126, 5.24, 31.4, 1.05, 1.05, 2.1, 10.5))
pumpData <- list(x = c(5, 1, 5, 14, 3, 19, 1, 1, 4, 22))
pumpInits <- list(alpha = 0.1, beta = 0.1, theta = rep(0.1, pumpConsts$N))
pump <- nimbleModel(code = pumpCode, name = "pump", constants = pumpConsts, 
                    data = pumpData, inits = pumpInits, buildDerivs = TRUE)
                    
# Build Laplace approximation
pumpLaplace <- buildLaplace(pump)

## Not run: 
# Compile the model
Cpump <- compileNimble(pump)
CpumpLaplace <- compileNimble(pumpLaplace, project = pump)
# Calculate MLEs of parameters
MLEres <- CpumpLaplace$findMLE()
# Calculate estimates and standard errors for parameters and random effects on original scale
allres <- CpumpLaplace$summary(MLEres, randomEffectsStdError = TRUE)

# Change the settings and also illustrate runLaplace
CpumpLaplace$updateSettings(innerOptimMethod = "nlminb", outerOptimMethod = "nlminb")
newres <- runLaplace(CpumpLaplace)

# Illustrate use of the component log likelihood and gradient functions to
# run an optimizer manually from R.
# Use nlminb to find MLEs
MLEres.manual <- nlminb(c(0.1, 0.1),
                        function(x) -CpumpLaplace$calcLogLik(x),
                        function(x) -CpumpLaplace$gr_Laplace(x))

## End(Not run)


[Package nimble version 1.2.1 Index]