PMC {LaplacesDemon} | R Documentation |
Population Monte Carlo
Description
The PMC
function updates a model with Population Monte Carlo.
Given a model specification, data, and initial values, PMC
maximizes the logarithm of the unnormalized joint
posterior density and provides samples of the marginal
posterior distributions, deviance, and other monitored variables.
Usage
PMC(Model, Data, Initial.Values, Covar=NULL, Iterations=10, Thinning=1,
alpha=NULL, M=1, N=1000, nu=9, CPUs=1, Type="PSOCK")
Arguments
Model |
This is a model specification function. For more
information, see |
Initial.Values |
This is either a vector initial values, one for
each of |
Data |
This is a list of data. For more information, see
|
Covar |
This is a |
Iterations |
This is the number of iterations during which PMC will update the model. Updating the model for only one iteration is the same as applying non-adaptive importance sampling. |
Thinning |
This is the number by which the posterior is
thinned. To have 1,000 posterior samples with |
alpha |
This is a vector of length |
M |
This is the number |
N |
This is the number |
nu |
This is the degrees of freedom parameter |
CPUs |
This argument is required for parallel processing, and
and indicates the number of central processing units (CPUs) of the
computer or cluster. For example, when a user has a quad-core
computer, |
Type |
This argument defaults to |
Details
The PMC
function uses the adaptive importance sampling
algorithm of Wraith et al. (2009), also called Mixture PMC or M-PMC
(Cappe et al., 2008). Iterative adaptive importance sampling was
introduced in the 1980s. Modern PMC was introduced (Cappe et al.,
2004), and extended to multivariate Gaussian or t-distributed mixtures
(Cappe et al., 2008). This version uses a multivariate t distribution
for each mixture component, and also allows a multivariate normal
distribution when the degrees of freedom, \nu > 1e4
. At each iteration, a mixture distribution is sampled with
importance sampling, and the samples (or populations) are adapted to
improve the importance sampling. Adaptation is a variant of EM
(Expectation-Maximization). The sample is self-normalized, and is an
example of self-normalized importance sampling (SNIS), or
self-importance sampling. The vector \alpha
contains the
probability of each mixture component. These, as well as multivariate
t distribution mixture parameters (except \nu
), are adapted
each iteration.
Advantages of PMC over MCMC include:
It is difficult to assess convergence of MCMC chains, and this is not necessary in PMC (Wraith et al., 2009).
MCMC chains have autocorrelation that effectively reduces posterior samples. PMC produces independent samples that are not reduced with autocorrelation.
PMC has been reported to produce samples with less variance than MCMC.
It is difficult to parallelize MCMC. Posterior samples from parallel chains can be pooled when all chains have converged, but until this occurs, parallelization is unhelpful. PMC, on the other hand, can parallelize the independent, Monte Carlo samples during each iteration and reduce run-time as the number of processors increases. Currently, PMC is not parallelized here.
The multivariate mixture in PMC can represent a multimodal posterior, where MCMC with parallel chains may be used to identify a multimodal posterior, but probably will not yield combined samples that proportionally represent it.
Disadvantages of PMC, compared to MCMC, include:
In PMC, the required number of samples at each iteration increases quickly with respect to an increase in parameters. MCMC is more suitable for models with large numbers of parameters, and therefore, MCMC is more generalizable.
PMC is more sensitive to initial values than MCMC, especially as the number of parameters increases.
PMC is more sensitive to the initial covariance matrix (or matrices for mixture components) than adaptive MCMC. PMC requires more information about the target distributions before updating. The covariance matrix from a converged iterative quadrature algorithm, Laplace Approximation, or Variational Bayes may be required (see
IterativeQuadrature
,LaplaceApproximation
, orVariationalBayes
for more information).
Since PMC requires better initial information than iterative quadrature, Laplace Approximation, MCMC, and Variational Bayes, it is not recommended to begin updating a model that has little prior information with PMC, especially when the model has more than a few parameters. Instead, iterative quadrature, Laplace Approximation, MCMC, or Variational Bayes should be used. However, once convergence is found or assumed, it is recommended to attempt to update the model with PMC, given the latest parameters and convariance matrix from iterative quadrature, Laplace Approximation, MCMC, or Variational Bayes. Used in this way, PMC may improve the model fit obtained with MCMC and should reduce the variance of the marginal posterior distributions, which is desirable for predictive modeling.
Convergence is assessed by observing two outputs: normalized effective
sample size (ESSN
) and normalized perplexity
(Perplexity
). These are described below. PMC is considered to
have converged when these diagnostics stabilize (Wraith et al., 2009),
or when the normalized perplexity becomes sufficiently close to 1
(Cappe et al., 2008). If they do not stabilize, then it is suggested
to begin PMC again with a larger number N
of samples, and
possibly with different initial values and covariance matrix or
matrices. IterativeQuadrature
,
LaplaceApproximation
, or VariationalBayes
may be helpful to provide better starting values for PMC
.
If a message appears that warns about ‘bad weights’, then PMC
is attempting to work with an iteration in which importance weights
are problematic. If this occurs in the first iteration, then all
importance weights are set to 1/N
. If this occurs in other
iterations, then the information from the previous iteration is used
instead and different draws are made from that importance
distribution. This may allow PMC
to eventually adapt
successfully to the target. If not, the user is advised to begin again
with a larger number N
of samples, and possibly different
initial values and covariance matrix or matrices, as above. PMC can
experience difficulty when it begins with poor initial conditions.
The user may combine samples from previous iterations with samples from the latest iteration for inference, if the algorithm converged before the last iteration. Currently, a function is not provided for combining previous samples.
Value
The returned object is an object of class pmc
with the
following components:
alpha |
This is a |
Call |
This is the matched call of |
Covar |
This stores the |
Deviance |
This is a vector of the deviance of the model, with a length equal to the number of thinned samples that were retained. Deviance is useful for considering model fit, and is equal to the sum of the log-likelihood for all rows in the data set, which is then multiplied by negative two. |
DIC |
This is a vector of three values: Dbar, pD, and DIC. Dbar
is the mean deviance, pD is a measure of model complexity indicating
the effective number of parameters, and DIC is the Deviance
Information Criterion, which is a model fit statistic that is the
sum of Dbar and pD. |
ESSN |
This is a vector of length |
Initial.Values |
This is the vector or matrix of
|
Iterations |
This reports the number of |
LML |
This is an approximation of the logarithm of the marginal
likelihood of the data (see the |
M |
This reports the number of mixture components. |
Minutes |
This indicates the number of minutes that |
Model |
This contains the model specification |
N |
This is the number of un-thinned samples per mixture component. |
itemnuThis is the degrees of freedom parameter \nu
for
each multivariate t distribution in each mixture component.
Mu |
This is a |
Monitor |
This is a |
Parameters |
This reports the number |
Perplexity |
This is a vector of length |
Posterior1 |
This is an |
Posterior2 |
This is a |
Summary |
This is a matrix that summarizes the marginal
posterior distributions of the parameters, deviance, and monitored
variables from thinned samples. The following summary statistics are
included: mean, standard deviation, MCSE (Monte Carlo Standard
Error), ESS is the effective sample size due to autocorrelation, and
finally the 2.5%, 50%, and 97.5% quantiles are reported. MCSE is
essentially a standard deviation around the marginal posterior mean
that is due to uncertainty associated with using Monte Carlo
sampling. The acceptable size of the MCSE depends on the acceptable
uncertainty associated around the marginal posterior mean. The
default |
Thinned.Samples |
This is the number of thinned samples in
|
Thinning |
This is the amount of thinning requested by the user. |
W |
This is a |
Author(s)
Statisticat, LLC. software@bayesian-inference.com
References
Cappe, O., Douc, R., Guillin, A., Marin, J.M., and Robert, C. (2008). "Adaptive Importance Sampling in General Mixture Classes". Statistics and Computing, 18, p. 587–600.
Cappe, O., Guillin, A., Marin, J.M., and Robert, C. (2004). "Population Monte Carlo". Journal of Computational and Graphical Statistics, 13, p. 907–929.
Gelman, A., Carlin, J., Stern, H., and Rubin, D. (2004). "Bayesian Data Analysis, Texts in Statistical Science, 2nd ed.". Chapman and Hall, London.
Wraith, D., Kilbinger, M., Benabed, K., Cappe, O., Cardoso, J.F., Fort, G., Prunet, S., and Robert, C.P. (2009). "Estimation of Cosmological Parameters Using Adaptive Importance Sampling". Physical Review D, 80(2), p. 023507.
See Also
BayesFactor
,
IterativeQuadrature
,
LaplaceApproximation
,
LML
,
PMC.RAM
,
Thin
, and
VariationalBayes
.
Examples
# The accompanying Examples vignette is a compendium of examples.
#################### Load the LaplacesDemon Library #####################
library(LaplacesDemon)
############################## Demon Data ###############################
data(demonsnacks)
y <- log(demonsnacks$Calories)
X <- cbind(1, as.matrix(log(demonsnacks[,c(1,4,10)]+1)))
J <- ncol(X)
for (j in 2:J) X[,j] <- CenterScale(X[,j])
######################### Data List Preparation #########################
mon.names <- "LP"
parm.names <- as.parm.names(list(beta=rep(0,J), sigma=0))
pos.beta <- grep("beta", parm.names)
pos.sigma <- grep("sigma", parm.names)
PGF <- function(Data) {
beta <- rnorm(Data$J)
sigma <- runif(1)
return(c(beta, sigma))
}
MyData <- list(J=J, PGF=PGF, X=X, mon.names=mon.names,
parm.names=parm.names, pos.beta=pos.beta, pos.sigma=pos.sigma, y=y)
########################## Model Specification ##########################
Model <- function(parm, Data)
{
### Parameters
beta <- parm[Data$pos.beta]
sigma <- interval(parm[Data$pos.sigma], 1e-100, Inf)
parm[Data$pos.sigma] <- sigma
### Log-Prior
beta.prior <- sum(dnormv(beta, 0, 1000, log=TRUE))
sigma.prior <- dhalfcauchy(sigma, 25, log=TRUE)
### Log-Likelihood
mu <- tcrossprod(Data$X, t(beta))
LL <- sum(dnorm(Data$y, mu, sigma, log=TRUE))
### Log-Posterior
LP <- LL + beta.prior + sigma.prior
Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LP,
yhat=rnorm(length(mu), mu, sigma), parm=parm)
return(Modelout)
}
set.seed(666)
############################ Initial Values #############################
Initial.Values <- GIV(Model, MyData, PGF=TRUE)
######################## Population Monte Carlo #########################
Fit <- PMC(Model, MyData, Initial.Values, Covar=NULL, Iterations=5,
Thinning=1, alpha=NULL, M=1, N=100, CPUs=1)
Fit
print(Fit)
PosteriorChecks(Fit)
caterpillar.plot(Fit, Parms="beta")
plot(Fit, BurnIn=0, MyData, PDF=FALSE)
Pred <- predict(Fit, Model, MyData, CPUs=1)
summary(Pred, Discrep="Chi-Square")
plot(Pred, Style="Covariates", Data=MyData)
plot(Pred, Style="Density", Rows=1:9)
plot(Pred, Style="ECDF")
plot(Pred, Style="Fitted")
plot(Pred, Style="Jarque-Bera")
plot(Pred, Style="Predictive Quantiles")
plot(Pred, Style="Residual Density")
plot(Pred, Style="Residuals")
Levene.Test(Pred)
Importance(Fit, Model, MyData, Discrep="Chi-Square")
#End