abc {abc} | R Documentation |
Parameter estimation with Approximate Bayesian Computation (ABC)
Description
This function performs multivariate parameter estimation based on summary statistics using an ABC algorithm. The algorithms implemented are rejection sampling, and local linear or non-linear (neural network) regression. A conditional heteroscedastic model is available for the latter two algorithms.
Usage
abc(target, param, sumstat, tol, method, hcorr = TRUE, transf = "none",
logit.bounds, subset = NULL, kernel = "epanechnikov", numnet =
10, sizenet = 5, lambda = c(0.0001,0.001,0.01), trace = FALSE, maxit =
500, ...)
Arguments
target |
a vector of the observed summary statistics. |
param |
a vector, matrix or data frame of the simulated parameter values,
i.e. the dependent variable(s) when |
sumstat |
a vector, matrix or data frame of the simulated summary statistics,
i.e. the independent variables when |
tol |
tolerance, the required proportion of points accepted nearest the target values. |
method |
a character string indicating the type of ABC algorithm to be
applied. Possible values are |
hcorr |
logical, the conditional heteroscedastic model is applied if
|
transf |
a vector of character strings indicating the kind of transformation
to be applied to the parameter values. The possible values are
|
logit.bounds |
a matrix of bounds if |
subset |
a logical expression indicating elements or rows to keep. Missing
values in |
kernel |
a character string specifying the kernel to be used when
|
numnet |
the number of neural networks when |
sizenet |
the number of units in the hidden layer. Defaults to 5. Can be zero
if there are no skip-layer units. See |
lambda |
a numeric vector or a single value indicating the weight decay when
|
trace |
logical, if |
maxit |
numeric, the maximum number of iterations. Defaults to 500. Applies
only when |
... |
other arguments passed to |
Details
These ABC algorithms generate random samples from the posterior
distributions of one or more parameters of interest, \theta_1,
\theta_2, \dots, \theta_n
. To apply any of these algorithms, (i) data
sets have to be simulated based on random draws from the prior
distributions of the \theta_i
's, (ii) from these data sets, a set
of summary statistics have to be calculated, S(y)
, (iii) the
same summary statistics have to be calculated from the observed data,
S(y_0)
, and (iv) a tolerance rate must be chosen
(tol
). See cv4abc
for a cross-validation tool
that may help in choosing the tolerance rate.
When method
is "rejection"
, the simple rejection
algorithm is used. Parameter values are accepted if the Euclidean
distance between S(y)
and S(y_0)
is sufficiently
small. The percentage of accepted simulations is determined by
tol
. When method
is "loclinear"
, a local linear
regression method corrects for the imperfect match between S(y)
and S(y_0)
. The accepted parameter values are weighted by a
smooth function (kernel
) of the distance between S(y)
and
S(y_0)
, and corrected according to a linear transform:
\theta^{*} = \theta - b(S(y) - S(y_0))
. \theta^{*}
's
represent samples form the posterior distribution. This method calls
the function lsfit
from the stats
library. When
using the "loclinear"
method, a warning about the collinearity
of the design matrix of the regression might be issued. In that
situation, we recommend to rather use the related "ridge"
method that performs local-linear ridge regression and deals with the
collinearity issue. The non-linear regression correction method
("neuralnet"
) uses a non-linear regression to minimize the
departure from non-linearity using the function nnet
.
The posterior samples of parameters based on the rejection algorithm
are returned as well, even when one of the regression algorithms is
used.
Several additional arguments can be specified when method
is
"neuralnet"
. The method is based on the function
nnet
from the library nnet
, which fits
single-hidden-layer neural networks. numnet
defines the
number of neural networks, thus the function nnet
is
called numnet
number of times. Predictions from different
neural networks can be rather different, so the median of the
predictions from all neural networks is used to provide a global
prediction. The choice of the number of neural networks is a trade-off
between speed and accuracy. The default is set to 10 networks. The
number of units in the hidden layer can be specified via
sizenet
. Selecting the number of hidden units is similar to
selecting the independent variables in a linear or non-linear
regression. Thus, it corresponds to the complexity of the
network. There is several rule of thumb to choose the number of hidden
units, but they are often unreliable. Generally speaking, the optimal
choice of sizenet
depends on the dimensionality, thus the
number of statistics in sumstat
. It can be zero when there are
no skip-layer units. See also nnet
for more details. The
method
"neuralnet"
is recommended when dealing with a
large number of summary statistics.
If method
is "loclinear"
, "neuralnet"
or "ridge"
, a
correction for heteroscedasticity is applied by default (hcorr =
TRUE
).
Parameters maybe transformed priori to estimation. The type of
transformation is defined by transf
. The length of
transf
is normally the same as the number of parameters. If
only one value is given, that same transformation is applied to all
parameters and the user is warned. When a parameter transformation
used, the parameters are back-transformed to their original scale
after the regression estimation. No transformations can be applied
when method
is "rejection"
.
Using names for the parameters and summary statistics is strongly
recommended. Names can be supplied as names
or
colnames
to param
and sumstat
(and
target
). If no names are supplied, P1, P2, ... is assigned to
parameters and S1, S2, ... to summary statistics and the user is
warned.
Value
The returned value is an object of class "abc"
, containing the
following components:
adj.values |
The regression adjusted values, when |
unadj.values |
The unadjusted values that correspond to
|
ss |
The summary statistics for the accepted simulations. |
weights |
The regression weights, when |
residuals |
The residuals from the regression when |
dist |
The Euclidean distances for the accepted simulations. |
call |
The original call. |
na.action |
A logical vector indicating the elements or rows that
were excluded, including both |
region |
A logical expression indicting the elements or rows that were accepted. |
transf |
The parameter transformations that have been used. |
logit.bounds |
The bounds, if transformation was |
kernel |
The kernel used. |
method |
Character string indicating the |
lambda |
A numeric vector of length |
numparam |
Number of parameters used. |
numstat |
Number of summary statistics used. |
aic |
The sum of the AIC of the |
bic |
The same but with the BIC. |
names |
A list with two elements: |
Author(s)
Katalin Csillery, Olivier Francois and Michael Blum with some initial code from Mark Beaumont.
References
Pritchard, J.K., and M.T. Seielstad and A. Perez-Lezaun and M.W. Feldman (1999) Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution, 16, 1791–1798.
Beaumont, M.A., Zhang, W., and Balding, D.J. (2002) Approximate Bayesian Computation in Population Genetics, Genetics, 162, 2025-2035.
Blum, M.G.B. and Francois, O. (2010) Non-linear regression models for Approximate Bayesian Computation. Statistics and Computing 20, 63-73.
Csillery, K., M.G.B. Blum, O.E. Gaggiotti and O. Francois (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology and Evolution, 25, 410-418.
See Also
summary.abc
, hist.abc
,
plot.abc
, lsfit
, nnet
,
cv4abc
Examples
require(abc.data)
data(musigma2)
?musigma2
## The rejection algorithm
##
rej <- abc(target=stat.obs, param=par.sim, sumstat=stat.sim, tol=.1, method =
"rejection")
## ABC with local linear regression correction without/with correction
## for heteroscedasticity
##
lin <- abc(target=stat.obs, param=par.sim, sumstat=stat.sim, tol=.1, hcorr =
FALSE, method = "loclinear", transf=c("none","log"))
linhc <- abc(target=stat.obs, param=par.sim, sumstat=stat.sim, tol=.1, method =
"loclinear", transf=c("none","log"))
## posterior summaries
##
linsum <- summary(linhc, intvl = .9)
linsum
## compare with the rejection sampling
summary(linhc, unadj = TRUE, intvl = .9)
## posterior histograms
##
hist(linhc, breaks=30, caption=c(expression(mu),
expression(sigma^2)))
## or send histograms to a pdf file
## Not run:
hist(linhc, file="linhc", breaks=30, caption=c(expression(mu),
expression(sigma^2)))
## End(Not run)
## diagnostic plots: compare the 2 'abc' objects: "loclinear",
## "loclinear" with correction for heteroscedasticity
##
plot(lin, param=par.sim)
plot(linhc, param=par.sim)
## example illustrates how to add "true" parameter values to a plot
##
postmod <- c(post.mu[match(max(post.mu[,2]), post.mu[,2]),1],
post.sigma2[match(max(post.sigma2[,2]), post.sigma2[,2]),1])
plot(linhc, param=par.sim, true=postmod)
## artificial example to show how to use the logit tranformations
##
myp <- data.frame(par1=runif(1000,-1,1),par2=rnorm(1000),par3=runif(1000,0,2))
mys <- myp+rnorm(1000,sd=.1)
myt <- c(0,0,1.5)
lin2 <- abc(target=myt, param=myp, sumstat=mys, tol=.1, method =
"loclinear", transf=c("logit","none","logit"),logit.bounds = rbind(c(-1,
1), c(NA, NA), c(0, 2)))
summary(lin2)