buildMCEM {nimble}R Documentation

Builds an MCEM algorithm for a given NIMBLE model

Description

Takes a NIMBLE model (with some missing data, aka random effects or latent state nodes) and builds a Monte Carlo Expectation Maximization (MCEM) algorithm for maximum likelihood estimation. The user can specify which latent nodes are to be integrated out in the E-Step, or default choices will be made based on model structure. All other stochastic non-data nodes will be maximized over. The E-step is done with a sample from a nimble MCMC algorithm. The M-step is done by a call to optim.

Usage

buildMCEM(
  model,
  paramNodes,
  latentNodes,
  calcNodes,
  calcNodesOther,
  control = list(),
  ...
)

Arguments

model

a NIMBLE model object, either compiled or uncompiled.

paramNodes

a character vector of names of parameter nodes in the model; defaults are provided by setupMargNodes. Alternatively, paramNodes can be a list in the format returned by setupMargNodes, in which case latentNodes, calcNodes, and calcNodesOther are not needed (and will be ignored).

latentNodes

a character vector of names of unobserved (latent) nodes to marginalize (sum or integrate) over; defaults are provided by setupMargNodes (as the randomEffectsNodes in its return list).

calcNodes

a character vector of names of nodes for calculating components of the full-data likelihood that involve latentNodes; defaults are provided by setupMargNodes. There may be deterministic nodes between paramNodes and calcNodes. These will be included in calculations automatically and thus do not need to be included in calcNodes (but there is no problem if they are).

calcNodesOther

a character vector of names of nodes for calculating terms in the log-likelihood that do not depend on any latentNodes, and thus are not part of the marginalization, but should be included for purposes of finding the MLE. This defaults to stochastic nodes that depend on paramNodes but are not part of and do not depend on latentNodes. There may be deterministic nodes between paramNodes and calcNodesOther. These will be included in calculations automatically and thus do not need to be included in calcNodesOther (but there is no problem if they are).

control

a named list for providing additional settings used in MCEM. See control section below.

...

provided only as a means of checking if a user is using the deprecated interface to 'buildMCEM' in nimble versions < 1.2.0.

Details

buildMCEM is a nimbleFunction that creates an MCEM algorithm for a model and choices (perhaps default) of nodes in different roles in the model. The MCEM can then be compiled for fast execution with a compiled model.

Note that buildMCEM was re-written for nimble version 1.2.0 and is not backward-compatible with previous versions. The new version is considered to be in beta testing.

Denote data by Y, latent states (or missing data) by X, and parameters by T. MCEM works by the following steps, starting from some T:

  1. Draw a sample of size M from P(X | Y, T) using MCMC.

  2. Update T to be the maximizer of E[log P(X, Y | T)] where the expectation is approximated as a Monte Carlo average over the sample from step(1)

  3. Repeat until converged.

The default version of MCEM is the ascent-based MCEM of Caffo et al. (2015). This attempts to update M when necessary to ensure that step 2 really moves uphill given that it is maximizing a Monte Carlo approximation and could accidently move downhill on the real surface of interest due to Monte Carlo error. The main tuning parameters include alpha, beta, gamma, Mfactor, C, and tol (tolerance).

If the model supports derivatives via nimble's automatic differentiation (AD) (and buildDerivs=TRUE in nimbleModel), the maximization step can use gradients from AD. You must manually set useDerivs=FALSE in the control list if derivatives aren't supported or if you don't want to use them.

In the ascent-based method, after maximization in step 2, the Monte Carlo standard error of the uphill movement is estimated. If the standardized uphill step is bigger than 0 with Type I error rate alpha, the iteration is accepted and the algorithm continues. Otherwise, it is not certain that step 2 really moved uphill due to Monte Carlo error, so the MCMC sample size M is incremented by a fixed factor (e.g. 0.33 or 0.5, called Mfactor in the control list), the additional samples are added by continuing step 1, and step 2 is tried again. If the Monte Carlo noise still overwhelms the magnitude of uphill movement, the sample size is increased again, and so on. alpha should be between 0 and 0.5. A larger value than usually used for inference is recommended so that there is an easy threshold to determine uphill movement, which avoids increasing M prematurely. M will never be increased above maxM.

Convergence is determined in a similar way. After a definite move uphill, we determine if the uphill increment is less than tol, with Type I error rate gamma. (But if M hits a maximum value, the convergence criterion changes. See below.)

beta is used to help set M to a minimal level based on previous iterations. This is a desired Type II error rate, assuming an uphill move and standard error based on the previous iteration. Set adjustM=FALSE in the control list if you don't want this behavior.

There are some additional controls on convergence for practical purposes. Set C in the control list to be the number of times the convergence criterion mut be satisfied in order to actually stop. E.g setting C=2 means there will always be a restart after the first convergence.

One problem that can occur with ascent-based MCEM is that the final iteration can be very slow if M must become very large to satisfy the convergence criterion. Indeed, if the algorithm starts near the MLE, this can occur. Set maxM in the control list to set the MCMC sample size that should never be exceeded.

If M==maxM, a softer convergence criterion is used. This second convergence criterion is to stop if we can't be sure we moved uphill using Type I error rate delta. This is a soft criterion because for small delta, Type II errors will be common (e.g. if we really did move uphill but can't be sure from the Monte Carlo sample), allowing the algorithm to terminate. One can continue the algorithm from where it stopped, so it is helpful to not have it get stuck when having a very hard time with the first (stricter) convergence criterion.

All of alpha, beta, delta, and gamma are utilized based on asymptotic arguments but in practice must be chosen heuristically. In other words, their theoretical basis does not exactly yield practical advice on good choices for efficiency and accuracy, so some trial and error will be needed.

It can also be helpful to set a minimum and maximum of allowed iterations (of steps 1 and 2 above). Setting minIter>1 in the control list can sometimes help avoid a false convergence on the first iteration by forcing at least one more iteration. Setting maxIter provides a failsafe on a stuck run.

If you don't want the ascent-based method at all and simply want to run a set of iterations, set ascent=FALSE in the control list. This will use the second (softer) convergence criterion.

Parameters to be maximized will by default be handled in an unconstrained parameter space, transformed if necessary by a parameterTransform object. In that case, the default optim method will be "BFGS" and can can be changed by setting optimMehod in the control list. Set useTransform=FALSE in the control list if you don't want the parameters transformed. In that case the default optimMethod will be "L-BFGS-B" if there are any actual constraints, and you can provide a list of boxConstraints in the control list. (Constraints may be determined by priors written in the model for parameters, even though their priors play no other role in MLE. E.g. sigma ~ halfflat() indicates sigma > 0).

Most of the control list elements can be overridden when calling the findMLE method. The findMLE argument continue=TRUE results in attempting to continue the algorithm where the previous call finished, including whatever settings were in use.

See setupMargNodes (which is called with the given arguments for paramNodes, calcNodes, and calcNodesOther; and with allowDiscreteLatent=TRUE, randomEffectsNodes=latentNodes, and check=check) for more about how the different groups of nodes are determined. In general, you can provide none, one, or more of the different kinds of nodes and setupMargNodes will try to determine the others in a sensible way. However, note that this cannot work for all ways of writing a model. One key example is that if random (latent) nodes are written as top-level nodes (e.g. following N(0,1)), they appear structurally to be parameters and you must tell buildMCEM that they are latentNodes. The various "Nodes" arguments will all be passed through model$expandNodeNames, allowing for example simply "x" to be provided when there are many nodes within "x".

Estimating the Monte Carlo standard error of the uphill step is not trivial because the sample was obtained by MCMC and so likely is autocorrelated. This is done by calling whatever function in R's global environment is called "MCEM_mcse", which is required to take two arguments: samples (which will be a vector of the differences in log(P(Y, X | T)) between the new and old values of T, across the sample of X) and m, the sample size. It must return an estimate of the standard error of the mean of the sample. NIMBLE provides a default version (exported from the package namespace), which calls mcmcse::mcse with method "obm". Simply provide a different function with this name in your R session to override NIMBLE's default.

Control list details

The control list accepts the following named elements:

Methods in the returned algorithm

The object returned by buildMCEM is a nimbleFunction object with the following methods

Author(s)

Perry de Valpine, Clifford Anderson-Bergman and Nicholas Michaud

References

Caffo, Brian S., Wolfgang Jank, and Galin L. Jones (2005). Ascent-based Monte Carlo expectation-maximization. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 235-251.

Louis, Thomas A (1982). Finding the Observed Information Matrix When Using the EM Algorithm. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 44(2), 226-233.

Examples

## Not run: 
pumpCode <- nimbleCode({
 for (i in 1:N){
     theta[i] ~ dgamma(alpha,beta);
     lambda[i] <- theta[i]*t[i];
     x[i] ~ dpois(lambda[i])
 }
 alpha ~ dexp(1.0);
 beta ~ dgamma(0.1,1.0);
})

pumpConsts <- list(N = 10,
              t = c(94.3, 15.7, 62.9, 126, 5.24,
                31.4, 1.05, 1.05, 2.1, 10.5))

pumpData <- list(x = c(5, 1, 5, 14, 3, 19, 1, 1, 4, 22))

pumpInits <- list(alpha = 1, beta = 1,
             theta = rep(0.1, pumpConsts$N))
pumpModel <- nimbleModel(code = pumpCode, name = 'pump', constants = pumpConsts,
                         data = pumpData, inits = pumpInits,
                         buildDerivs=TRUE)

pumpMCEM <- buildMCEM(model = pumpModel)

CpumpModel <- compileNimble(pumpModel)

CpumpMCEM <- compileNimble(pumpMCEM, project=pumpModel)

MLE <- CpumpMCEM$findMLE()
vcov <- CpumpMCEM$vcov(MLE$par)


## End(Not run)

[Package nimble version 1.2.1 Index]