R: stage2

stage2 {abctools}

R Documentation

stage2

Description

Summary statistics selection for ABC inference using estimated posterior error.

Usage

stage2(obs, param, sumstats, obspar = NULL, init.best, dsets = 100, 
sumsubs = 1:ncol(sumstats), limit = length(sumsubs), do.only=NULL, 
do.err = FALSE, final.dens = FALSE,  ...)

Arguments

`obs`	observed summary statistics.
`param`	matrix of simulated model parameter values.
`sumstats`	matrix of simulated summary statistics.
`obspar`	optional observed parameters (for use to assess simulation performance).
`init.best`	an initial estimate of the best summary statistics subset. Can be either an index into the summaries combination table (see `combmat`) or a vector of indices into `1:nstats`. See details.
`dsets`	the number of simulated datasets to treat as observed when estimating the posterior error. See details.
`sumsubs`	an optional index into the summary statistics to limit summary selection to a specific subset of summaries.
`limit`	an optional integer indicating whether to limit summary selection to subsets of a maximum size.
`do.only`	an optional index into the summary statistics combination table. Can be used to limit entropy calculations to certain summary statistics subsets only.
`do.err`	a boolean value indicating whether the simulation error should be returned. Note: if `do.err=TRUE`, `obspar` must be supplied.
`final.dens`	a boolean value indicating whether the posterior sample should be returned.
`...`	any other optional arguments to the ABC inference procedure (e.g. arguments to the `abc` function).

Details

The function uses the init.best set of summaries to determine the dsets simulated datasets which are closest (in Euclidean norm) to the observed dataset. Since the model parameters generating the summary statistics are known for these simulated datasets, for each candidate subset of summary statistics, we can compute the error under ABC inference for each of these datasets. The best subset of summary statistics is that which minimizes this (average) error over all dsets datasets.

Value

A list with the following components:

`best`	the best subset of statistics.
`closest`	the indices of the `dsets` simulated datasets closest to the oberved dataset as measured by the `init.best` subset of summaries.
`err`	simulation error (if `obspar` is supplied and `do.err=TRUE`).
`order`	the subsets considered during the algorithm (same as the input `do.only`.
`post.sample`	an array of dimension `nacc x npar x ndatasets` giving the posterior sample for each observed dataset. Not returned if `final.dens=FALSE`.
`sumsubs`	an index into the subsets considered during the algorithm.

Warning

This function is computationally intensive due to its cyclic ABC inference procedure.

Author(s)

Matt Nunes

References

Blum, M. G. B, Nunes, M. A., Prangle, D. and Sisson, S. A. (2013) A comparative review of dimension reduction methods in approximate Bayesian computation. Stat. Sci. 28, Issue 2, 189–208.

Nunes, M. A. and Balding, D. J. (2010) On Optimal Selection of Summary Statistics for Approximate Bayesian Computation. Stat. Appl. Gen. Mol. Biol. 9, Iss. 1, Art. 34.

Nunes, M. A. and Prangle, D. (2016) abctools: an R package for tuning approximate Bayesian computation analyses. The R Journal 7, Issue 2, 189–205.

Examples


# load example data:

data(coal)
data(coalobs)

param<-coal[,2]
simstats<-coal[,5:8]

# use matrix below just in case to preserve dimensions.

obsstats<-matrix(coalobs[1,5:8],nrow=1)
obsparam<-matrix(coalobs[1,1])

## Not run: 
tmp<-stage2(obsstats, param, simstats, init.bes=c(1,3), dsets = 10) 
tmp$best

## End(Not run)

[Package abctools version 1.1.7 Index]