modMCMC {FME} | R Documentation |
Constrained Markov Chain Monte Carlo
Description
Performs a Markov Chain Monte Carlo simulation, using an adaptive Metropolis (AM) algorithm and including a delayed rejection (DR) procedure.
Usage
modMCMC(f, p, ..., jump = NULL, lower = -Inf, upper = +Inf,
prior = NULL, var0 = NULL, wvar0 = NULL, n0 = NULL,
niter = 1000, outputlength = niter, burninlength = 0,
updatecov = niter, covscale = 2.4^2/length(p),
ntrydr = 1, drscale = NULL, verbose = TRUE)
## S3 method for class 'modMCMC'
summary(object, remove = NULL, ...)
## S3 method for class 'modMCMC'
pairs(x, Full = FALSE, which = 1:ncol(x$pars),
remove = NULL, nsample = NULL, ...)
## S3 method for class 'modMCMC'
hist(x, Full = FALSE, which = 1:ncol(x$pars),
remove = NULL, ask = NULL, ...)
## S3 method for class 'modMCMC'
plot(x, Full = FALSE, which = 1:ncol(x$pars),
trace = TRUE, remove = NULL, ask = NULL, ...)
Arguments
f |
the function to be evaluated, with first argument the vector
of parameters which should be varied. It should return either the
model residuals, an element of class modCost (as returned by
a call to |
p |
initial values for the parameters to be optimized over. |
... |
additional arguments passed to function |
jump |
jump length, either a number, a vector with length equal to the total number of parameters, a covariance matrix, or a function that takes as input the current values of the parameters and produces as output the perturbed parameters. See details. |
prior |
-2*log(parameter prior probability), either a function
that is called as |
var0 |
initial model variance; if |
wvar0 |
"weight" for the initial model variance – see details. |
n0 |
parameter used for weighing the initial model variance -
if |
lower |
lower bounds on the parameters; for unbounded parameters
set equal to |
upper |
upper bounds on the parameters; for unbounded parameters
set equal to |
niter |
number of iterations for the MCMC. |
outputlength |
number of iterations kept in the output; should
be smaller or equal to |
updatecov |
number of iterations after which the parameter covariance matrix is (re)evaluated based on the parameters kept thus far, and used to update the MCMC jumps. |
covscale |
scale factor for the parameter covariance matrix, used to perform the MCMC jumps. |
burninlength |
number of initial iterations to be removed from output. |
ntrydr |
maximal number of tries for the delayed rejection procedure. It is generally not a good idea to set this to a too large value. |
drscale |
for each try during delayed rejection, the cholesky
decomposition of the proposal matrix is scaled with this amount; if
|
verbose |
if |
object |
an object of class |
x |
an object of class |
Full |
If TRUE then not only the parameters will be plotted, but also the function value and (if appropriate) the model variance(s). |
which |
the name or the index to the parameters that should be
plotted. Default = all parameters. If |
trace |
if |
remove |
a list with indices of the runs that should be removed (e.g. to remove runs during burnin). |
nsample |
the number of xy pairs to be plotted on the upper
panel in the pairs plot. When |
ask |
logical; if |
Details
Note that arguments after ... must be matched exactly.
R-function f
is called as f(p, ...)
. It should return
either -2 times the log likelihood of the model (one value), the
residuals between model and data or an item of class modFit
(as
created by function modFit
.
In the latter two cases, it is assumed that the prior distribution for
\theta
is either non-informative or gaussian. If gaussian, it
can be treated as a sum of squares (SS). If the measurement function
is defined as:
y=f(\theta) + \xi\\
\xi ~ N(0,\sigma^2)
where \xi
is the measurement error, assumed normally
distribution, then the posterior for the parameters will be estimated
as:
p(\theta | y,\sigma^2)\propto exp(-0.5 \cdot (\frac{SS(\theta)}{\sigma^2}
+SS_{pri}(\theta))
and where \sigma^2
is the error variance, SS is the sum of squares
function SS(\theta)=\sum(y_i-f(\theta))^2
.
If non-informative priors are used, then SS_{pri}(\theta)=0
.
The error variance \sigma^2
is considered a nuisance parameter.
A prior distribution of it should be specified and a posterior distribution
is estimated.
If wvar0
or n0
is >0, then the variances are sampled as
conjugate priors from the inverse gamma distribution with parameters
var0
and n0=wvar0*n
. Larger values of wvar0
keep
these samples closer to var0
.
Thus, at each step, 1/ the error variance (\sigma^{-2}
) is sampled
from a gamma distribution:
p(\sigma^{-2}|y,\theta) \sim \Gamma(\frac{(n_0+n)}{2},
\frac{(n_0 \cdot var0+SS(\theta))}{2})
where n
is the number of data points and where
n0=n \cdot wvar0
, and where the second argument to the gamma function
is the shape parameter.
The prior parameters (var0
and wvar0
) are the prior mean
for \sigma^2
and the prior accuracy.
By setting wvar0
equal to 1, equal weight is given to the
prior and the current value.
If wvar0
is 0 then the prior is ignored.
If wvar0
is NULL
(the default) then the error variances are
assumed to be fixed.
var0
estimates the variance of the measured components. In case
independent estimates are not available, these variances can be
obtained from the mean squares of fitted residuals. (e.g. as reported
in modFit
). See the examples. (but note that this is not truly
independent information)
var0
is either one value, or a value for each observed
variable, or a value for each observed data point.
When var0
is not NULL
, then f
is assumed to
return the model residuals OR an instance of class modCost
.
When var0=NULL
, then f
should return either
-2*log(probability of the model), or an instance of class
modCost
.
modMCMC
implements the Metropolis-Hastings method. The proposal
distribution, which is used to generate new parameter values is the
(multidimensional) Gaussian density distribution, with standard
deviation given by jump
.
jump
can be either one value, a vector of length = number of
parameters or a parameter covariance matrix (nrow = ncol = number
parameters).
The jump parameter, jump
thus determines how much the new
parameter set will deviate from the old one.
If jump
is one value, or a vector, then the new parameter
values are generated by sampling a normal distribution with standard
deviation equal to jump
. A larger value will lead to larger
jumps in the parameter space, but acceptance of new points can get
very low. Smaller jump lengths increase the acceptance rate, but the
algorithm may move too slowly, and too many runs may be needed to scan
the parameter space.
If jump
is NULL
, then the jump length is taken as 10%
of the parameter value as given in p
.
jump
can also be a proposal covariance matrix. In this case,
the new parameter values are generated by sampling a multidimensional
normal distribution. It can be efficient to initialise jump
using the parameter covariance as resulting from fitting the model
(e.g. using modFit
) – see examples.
Finally, jump
can also be an R-function that takes as input the
current values of the parameters and returns the new parameter values.
Two methods are implemented to increase the number of accepted runs.
In the "adaptive Metropolis" method, new parameters are generated with a covariance matrix that is estimated from the parameters generated (and saved) thus far. The idea behind this is that the MCMC method is more efficient if the proposal covariance (to generate new parameter values) is somehow tuned to the shape and size of the target distribution.
Setting
updatecov
smaller thanniter
will trigger this functionality. In this case, everyupdatecov
iterations, the jump covariance matrix will be estimated from the covariance matrix of the saved parameter values. The covariance matrix is scaled with(2.4^2/npar)
where npar is the number of parameters, unlesscovscale
has been given a different value. Thus,Jump= ( cov(\theta_1,\theta_2,....\theta_n) \cdot diag(np,+1e^{-16})\cdot(2.4^2/npar)
where the small number1e^{-16}
is added on the diagonal of the covariance matrix to prevent it from becoming singular.Note that a problem of adapting the proposal distribution using the MCMC results so far is that standard convergence results do not apply. One solution is to use adaptation only for the burn-in period and discard the part of the chain where adaptation has been used.
Thus, when using
updatecov
with a positive value ofburninlength
, the proposal distribution is only updated during burnin. Ifburninlength
= 0 though, the updates occur throughout the entire simulation.When using the adaptive Metropolis method, it is best to start with a small value of the jump length.
In the "delayed rejection" method, new parameter values are tried upon rejection. The process of delaying rejection can be iterated for at most
ntrydr
trials. Settingntrydr
equal to 1 (the default) toggles off delayed rejection.During the delayed rejection procedure, new parameters are generated from the last accepted value by scaling the jump covariance matrix with a factor as specified in
drscale
. The acceptance probability of this new set depends on the candidates so far proposed and rejected, in such a way that reversibility of the Markov chain is preserved. See Haario et al. (2005, 2006) for more details.
Convergence of the MCMC chain can be checked via plot
, which
plots for each iteration the values of all parameters, and if
Full
is TRUE
, of the function value (SS) and (if
appropriate) the modeled variance. If converged, there should be no
visible drift.
In addition, the methods from package coda
become available by
making the object returned by modMCMC
of class mcmc
, as
used in the methods of coda
. For instance, if object
MCMCres
is returned by modMCMC
then
as.mcmc(MCMCres$pars)
will make an instance of class
mcmc
, usable by coda
.
The burninlength
is the number of initial steps that is not
included in the output. It can be useful if the initial value of the
parameters is far from the optimal value. Starting the MCMC with the
best fit parameter set will alleviate the need for using
burninlength
.
Value
a list of class modMCMC containing the results as returned from the Markov chain.
This includes the following:
pars |
an array with dimension ( |
SS |
vector with the sum of squares function, one for each row
in |
naccepted |
the number of accepted runs. |
sig |
the sampled error variance |
bestpar |
the parameter set that gave the highest probability. |
bestfunp |
the function value corresponding to |
prior |
the parameter prior, one value for each row in
|
count |
information about the MCMC chain: number of delayed
rejection steps ( |
settings |
the settings for error covariance calculation,
i.e. arguments |
The list returned by modMCMC
has methods for the generic
functions summary
, plot
,
pairs
– see note.
Note
The following S3 methods are provided:
summary, produces summary statistics of the MCMC results
plot, plots the MCMC results, for all parameters. Use it to check convergence.
pairs, produces a pairs plot of the MCMC results; overrides the default
gap = 0
,upper.panel = NA
, anddiag.panel
.
It is also possible to use the methods from the coda
package,
e.g. densplot
.
To do that, first the modMCMC
object has to be converted to an
mcmc object. See the examples for an application.
Author(s)
Karline Soetaert <karline.soetaert@nioz.nl>
Marko Laine <Marko.Laine@fmi.fi>
References
Laine, M., 2008. Adaptive MCMC Methods With Applications in Environmental and Geophysical Models. Finnish Meteorological Institute contributions 69, ISBN 978-951-697-662-7, Finnish Meteorological Institute, Helsinki.
Haario, H., Saksman, E. and Tamminen, J., 2001. An Adaptive Metropolis Algorithm. Bernoulli 7, pp. 223–242. doi:10.2307/3318737
Haario, H., Laine, M., Mira, A. and Saksman, E., 2006. DRAM: Efficient Adaptive MCMC. Statistics and Computing, 16(4), 339–354. doi:10.1007/s11222-006-9438-0
Haario, H., Saksman, E. and Tamminen, J., 2005. Componentwise Adaptation for High Dimensional MCMC. Computational Statistics 20(2), 265–274. doi:10.1007/BF02789703
Gelman, A. Varlin, J. B., Stern, H. S. and Rubin, D. B., 2004. Bayesian Data Analysis. Second edition. Chapman and Hall, Boca Raton.
Soetaert, K. and Petzoldt, T. 2010. Inverse Modelling, Sensitivity and Monte Carlo Analysis in R Using Package FME. Journal of Statistical Software 33(3) 1–28. doi:10.18637/jss.v033.i03
See Also
modFit
for constrained model fitting
Examples
## =======================================================================
## Sampling a 3-dimensional normal distribution,
## =======================================================================
# mean = 1:3, sd = 0.1
# f returns -2*log(probability) of the parameter values
NN <- function(p) {
mu <- c(1,2,3)
-2*sum(log(dnorm(p, mean = mu, sd = 0.1))) #-2*log(probability)
}
# simple Metropolis-Hastings
MCMC <- modMCMC(f = NN, p = 0:2, niter = 5000,
outputlength = 1000, jump = 0.5)
# More accepted values by updating the jump covariance matrix...
MCMC <- modMCMC(f = NN, p = 0:2, niter = 5000, updatecov = 100,
outputlength = 1000, jump = 0.5)
summary(MCMC)
plot(MCMC) # check convergence
pairs(MCMC)
## =======================================================================
## test 2: sampling a 3-D normal distribution, larger standard deviation...
## noninformative priors, lower and upper bounds imposed on parameters
## =======================================================================
NN <- function(p) {
mu <- c(1,2,2.5)
-2*sum(log(dnorm(p, mean = mu, sd = 0.5))) #-2*log(probability)
}
MCMC2 <- modMCMC(f = NN, p = 1:3, niter = 2000, burninlength = 500,
updatecov = 10, jump = 0.5, lower = c(0, 2, 1), upper = c(1, 3, 3))
plot(MCMC2)
hist(MCMC2, breaks = 20)
## Compare output of p3 with theoretical distribution
hist(MCMC2, which = "p3", breaks = 20)
lines(seq(1, 3, 0.1), dnorm(seq(1, 3, 0.1), mean = 2.5,
sd = 0.5)/pnorm(3, 2.5, 0.5))
summary(MCMC2)
# functions from package coda...
cumuplot(as.mcmc(MCMC2$pars))
summary(as.mcmc(MCMC2$pars))
raftery.diag(MCMC2$pars)
## =======================================================================
## test 3: sampling a log-normal distribution, log mean=1:4, log sd = 1
## =======================================================================
NL <- function(p) {
mu <- 1:4
-2*sum(log(dlnorm(p, mean = mu, sd = 1))) #-2*log(probability)
}
MCMCl <- modMCMC(f = NL, p = log(1:4), niter = 3000,
outputlength = 1000, jump = 5)
plot(MCMCl) # bad convergence
cumuplot(as.mcmc(MCMCl$pars))
MCMCl <- modMCMC (f = NL, p = log(1:4), niter = 3000,
outputlength = 1000, jump = 2^(2:5))
plot(MCMCl) # better convergence but CHECK it!
pairs(MCMCl)
colMeans(log(MCMCl$pars))
apply(log(MCMCl$pars), 2, sd)
MCMCl <- modMCMC (f = NL, p = rep(1, 4), niter = 3000,
outputlength = 1000, jump = 5, updatecov = 100)
plot(MCMCl)
colMeans(log(MCMCl$pars))
apply(log(MCMCl$pars), 2, sd)
## =======================================================================
## Fitting a Monod (Michaelis-Menten) function to data
## =======================================================================
# the observations
#---------------------
Obs <- data.frame(x=c( 28, 55, 83, 110, 138, 225, 375), # mg COD/l
y=c(0.053,0.06,0.112,0.105,0.099,0.122,0.125)) # 1/hour
plot(Obs, pch = 16, cex = 2, xlim = c(0, 400), ylim = c(0, 0.15),
xlab = "mg COD/l", ylab = "1/hr", main = "Monod")
# the Monod model
#---------------------
Model <- function(p, x) data.frame(x = x, N = p[1]*x/(x+p[2]))
# Fitting the model to the data
#---------------------
# define the residual function
Residuals <- function(p) (Obs$y - Model(p, Obs$x)$N)
# use modFit to find parameters
P <- modFit(f = Residuals, p = c(0.1, 1))
# plot best-fit model
x <-0:375
lines(Model(P$par, x))
# summary of fit
sP <- summary(P)
sP[]
print(sP)
# Running an MCMC
#---------------------
# estimate parameter covariances
# (to efficiently generate new parameter values)
Covar <- sP$cov.scaled * 2.4^2/2
# the model variance
s2prior <- sP$modVariance
# set nprior = 0 to avoid updating model variance
MCMC <- modMCMC(f = Residuals, p = P$par,jump = Covar, niter = 1000,
var0 = s2prior, wvar0 = NULL, updatecov = 100)
plot(MCMC, Full = TRUE)
pairs(MCMC)
# function from the coda package.
raftery.diag(as.mcmc(MCMC$pars))
cor(MCMC$pars)
cov(MCMC$pars) # covariances by MCMC
sP$cov.scaled # covariances by Hessian of fit
x <- 1:400
SR <- summary(sensRange(parInput = MCMC$pars, func = Model, x = x))
plot(SR, xlab="mg COD/l", ylab = "1/hr", main = "Monod")
points(Obs, pch = 16, cex = 1.5)