ABC {poolABC} | R Documentation |
Parameter estimation with Approximate Bayesian Computation with several targets
Description
Perform multivariate parameter estimation based on summary statistics using an Approximate Bayesian Computation (ABC) algorithm. This function always uses a rejection sampling algorithm while a local linear regression algorithm might or might not be used.
Usage
ABC(
nPops,
ntrials,
freqs,
positions,
range,
rMajor,
rMinor,
coverage,
window,
nLoci,
limits,
params,
sumstats,
tol,
method,
parallel = FALSE,
ncores = NA
)
Arguments
nPops |
is an integer indicating how many different populations are present in the dataset you are analysing. |
ntrials |
indicates how many different trials should be performed. Each trial corresponds to a different target for the parameter estimation. |
freqs |
is a list containing the allelic frequencies. Each entry of that list should represent a different contig and be a matrix where each row corresponds to a different site and each column to a different population. |
positions |
is a list containing the position of the SNPs. Each entry should represent a different contig and be a vector containing the position of each SNP present in the contig. |
range |
is a list containing the range of the contig. Each entry should represent a different contig and be a vector with two entries: the first detailing the minimum position of the contig and the second the maximum position of the contig. |
rMajor |
a list containing the number of major allele reads. Each entry should represent a different contig. For each contig (matrix), each row should be a different site and each column a different population. |
rMinor |
a list containing the number of minor allele reads. Each entry should represent a different contig. For each contig (matrix), each row should be a different site and each column a different population. |
coverage |
is a list containing the depth of coverage. Each entry should represent a different contig and be a matrix with the sites as rows and the different populations as columns. |
window |
is a non-negative integer indicating the size, in base pairs, of the block of the contig to keep. |
nLoci |
is a non-negative integer indicating how many different contigs
should be kept in the output. If each randomly selected |
limits |
is a matrix with two columns and as many rows as there are parameters. Each row should contain the minimum value of the prior for a given parameter in the first column and the maximum value in the second column. |
params |
is a vector or matrix of simulated parameter values i.e. numbers from the simulations. Each row or vector entry should be a different simulation and each column of a matrix should be a different parameter. This is the dependent variable for the regression, if a regression step is performed. |
sumstats |
is a vector or matrix of simulated summary statistics. Each row or vector entry should be a different simulation and each column of a matrix should be a different statistic. These act as the independent variables if a regression step is performed. |
tol |
is the tolerance rate, indicating the required proportion of points accepted nearest the target values. |
method |
either "rejection" or "regression" indicating whether a regression step should be performed during ABC parameter estimation. |
parallel |
logical, indicating whether this function should be run using parallel execution. The default setting is FALSE, meaning that this function will utilize a single core. |
ncores |
a non-negative integer that is required when |
Details
To use this function, the usual steps of ABC parameter estimation have to be
performed. Briefly, data should have been simulated based on random draws
from the prior distributions of the parameters of interest and a set of
summary statistics should have been calculated from that data. This function
requires as input the observed data and computes the same set of summary
statistics from that observed data. Multiple sets of observed summary
statistics are computed from ntrials
sets of nLoci
blocks of size
window
. Parameter estimation is performed for each one of those sets of
observed summary statistics i.e. each set corresponds to a different target.
After computing this set of observed summary statistics, a simple rejection
is performed by calling the rejABC()
function. In this step, parameter
values are accepted if the Euclidean distance between the set of summary
statistics computed from the simulated data and the set of summary statistics
computed from the observed data is sufficiently small. The percentage of
accepted simulations is determined by tol
.
When method
is "regression", a local linear regression method is used to
correct for the imperfect match between the summary statistics computed from
the simulated data and the summary statistics computed from the observed
data. The output of the rejABC()
function is used as the input of the
regABC()
function to apply this correction. The parameter values accepted
in the rejection step are weighted by a smooth function (kernel) of the
distance between the simulated and observed summary statistics and corrected
according to a linear transformation.
Value
a list with seven different entries.
target |
observed summary statistics. |
ss |
set of accepted summary statistics from the simulations. |
unadjusted |
parameter estimates obtained with the rejection sampling. |
adjusted |
regression adjusted parameter values. |
predmean |
estimates of the posterior mean for each parameter. |
weights |
regression weights. |
position |
position of each SNP used for calculating the observed summary statistics. |
See Also
For more details see the poolABC vignette:
vignette("poolABC", package = "poolABC")
Examples
# Note that this example is limited to a few of the options available
# you should check the poolABC vignette for more details
# this creates a variable with the path for the toy example data
mypath <- system.file('extdata', package = 'poolABC')
# import data for two populations from all files
mydata <- importContigs(path = mypath, pops = c(8, 10))
# to perform parameter inference for two populations using the rejection method
# and with a tolerance of 0.01
myabc <- ABC(nPops = 2, ntrials = 10, freqs = mydata$freqs, positions = mydata$positions,
range = mydata$range, rMajor = mydata$rMajor, rMinor = mydata$rMinor, coverage = mydata$coverage,
window = 1000, nLoci = 4, limits, params, sumstats, tol = 0.01, method = "rejection")
# the previous will perform parameter inference for 10 different targets (ntrials = 100)
# each of those trials will be comprised of 4 loci, each with 1000 base pairs
# to perform parameter inference for two populations using the regression method
# and with a tolerance of 0.01
myabc <- ABC(nPops = 2, ntrials = 10, freqs = mydata$freqs, positions = mydata$positions,
range = mydata$range, rMajor = mydata$rMajor, rMinor = mydata$rMinor, coverage = mydata$coverage,
window = 1000, nLoci = 4, limits, params, sumstats, tol = 0.01, method = "regression")