ABC {poolABC}R Documentation

Parameter estimation with Approximate Bayesian Computation with several targets

Description

Perform multivariate parameter estimation based on summary statistics using an Approximate Bayesian Computation (ABC) algorithm. This function always uses a rejection sampling algorithm while a local linear regression algorithm might or might not be used.

Usage

ABC(
  nPops,
  ntrials,
  freqs,
  positions,
  range,
  rMajor,
  rMinor,
  coverage,
  window,
  nLoci,
  limits,
  params,
  sumstats,
  tol,
  method,
  parallel = FALSE,
  ncores = NA
)

Arguments

nPops

is an integer indicating how many different populations are present in the dataset you are analysing.

ntrials

indicates how many different trials should be performed. Each trial corresponds to a different target for the parameter estimation.

freqs

is a list containing the allelic frequencies. Each entry of that list should represent a different contig and be a matrix where each row corresponds to a different site and each column to a different population.

positions

is a list containing the position of the SNPs. Each entry should represent a different contig and be a vector containing the position of each SNP present in the contig.

range

is a list containing the range of the contig. Each entry should represent a different contig and be a vector with two entries: the first detailing the minimum position of the contig and the second the maximum position of the contig.

rMajor

a list containing the number of major allele reads. Each entry should represent a different contig. For each contig (matrix), each row should be a different site and each column a different population.

rMinor

a list containing the number of minor allele reads. Each entry should represent a different contig. For each contig (matrix), each row should be a different site and each column a different population.

coverage

is a list containing the depth of coverage. Each entry should represent a different contig and be a matrix with the sites as rows and the different populations as columns.

window

is a non-negative integer indicating the size, in base pairs, of the block of the contig to keep.

nLoci

is a non-negative integer indicating how many different contigs should be kept in the output. If each randomly selected window is a different loci, then how many different window should be selected?

limits

is a matrix with two columns and as many rows as there are parameters. Each row should contain the minimum value of the prior for a given parameter in the first column and the maximum value in the second column.

params

is a vector or matrix of simulated parameter values i.e. numbers from the simulations. Each row or vector entry should be a different simulation and each column of a matrix should be a different parameter. This is the dependent variable for the regression, if a regression step is performed.

sumstats

is a vector or matrix of simulated summary statistics. Each row or vector entry should be a different simulation and each column of a matrix should be a different statistic. These act as the independent variables if a regression step is performed.

tol

is the tolerance rate, indicating the required proportion of points accepted nearest the target values.

method

either "rejection" or "regression" indicating whether a regression step should be performed during ABC parameter estimation.

parallel

logical, indicating whether this function should be run using parallel execution. The default setting is FALSE, meaning that this function will utilize a single core.

ncores

a non-negative integer that is required when parallel is TRUE. It specifies the number of cores to use for parallel execution.

Details

To use this function, the usual steps of ABC parameter estimation have to be performed. Briefly, data should have been simulated based on random draws from the prior distributions of the parameters of interest and a set of summary statistics should have been calculated from that data. This function requires as input the observed data and computes the same set of summary statistics from that observed data. Multiple sets of observed summary statistics are computed from ntrials sets of nLoci blocks of size window. Parameter estimation is performed for each one of those sets of observed summary statistics i.e. each set corresponds to a different target.

After computing this set of observed summary statistics, a simple rejection is performed by calling the rejABC() function. In this step, parameter values are accepted if the Euclidean distance between the set of summary statistics computed from the simulated data and the set of summary statistics computed from the observed data is sufficiently small. The percentage of accepted simulations is determined by tol.

When method is "regression", a local linear regression method is used to correct for the imperfect match between the summary statistics computed from the simulated data and the summary statistics computed from the observed data. The output of the rejABC() function is used as the input of the regABC() function to apply this correction. The parameter values accepted in the rejection step are weighted by a smooth function (kernel) of the distance between the simulated and observed summary statistics and corrected according to a linear transformation.

Value

a list with seven different entries.

target

observed summary statistics.

ss

set of accepted summary statistics from the simulations.

unadjusted

parameter estimates obtained with the rejection sampling.

adjusted

regression adjusted parameter values.

predmean

estimates of the posterior mean for each parameter.

weights

regression weights.

position

position of each SNP used for calculating the observed summary statistics.

See Also

For more details see the poolABC vignette: vignette("poolABC", package = "poolABC")

Examples

# Note that this example is limited to a few of the options available
# you should check the poolABC vignette for more details

# this creates a variable with the path for the toy example data
mypath <- system.file('extdata', package = 'poolABC')

# import data for two populations from all files
mydata <- importContigs(path = mypath, pops = c(8, 10))

# to perform parameter inference for two populations using the rejection method
# and with a tolerance of 0.01
myabc <- ABC(nPops = 2, ntrials = 10, freqs = mydata$freqs, positions = mydata$positions,
range = mydata$range, rMajor = mydata$rMajor, rMinor = mydata$rMinor, coverage = mydata$coverage,
window = 1000, nLoci = 4, limits, params, sumstats, tol = 0.01, method = "rejection")

# the previous will perform parameter inference for 10 different targets (ntrials = 100)
# each of those trials will be comprised of 4 loci, each with 1000 base pairs

# to perform parameter inference for two populations using the regression method
# and with a tolerance of 0.01
myabc <- ABC(nPops = 2, ntrials = 10, freqs = mydata$freqs, positions = mydata$positions,
range = mydata$range, rMajor = mydata$rMajor, rMinor = mydata$rMinor, coverage = mydata$coverage,
window = 1000, nLoci = 4, limits, params, sumstats, tol = 0.01, method = "regression")


[Package poolABC version 1.0.0 Index]