forceLocus {poolABC} | R Documentation |
Force the simulations to contain the required number of loci
Description
This function attempts to force the required number of loci after the filtering steps are performed.
Usage
forceLocus(
model,
parameters,
nSites,
nLoci,
nDip,
mutrate,
mean,
variance,
minimum,
maximum,
size,
min.minor
)
Arguments
model |
a character, either 2pops", "Single" or "Parallel" indicating which model should be simulated. |
parameters |
a vector of parameters used to create the command line for
the scrm package. Each entry of the vector is a different parameter. Note
that each vector entry should be named with the name of the corresponding
parameter. The output of the |
nSites |
is an integer that specifies how many base pairs should scrm simulate, i.e. how many sites per locus to simulate. |
nLoci |
an integer that represents how many independent loci should be simulated. |
nDip |
an integer representing the total number of diploid individuals to simulate. Note that scrm actually simulates haplotypes, so the number of simulated haplotypes is double of this. Also note that this is the total number of diploid individuals and this function will distribute the individuals equally by the simulated populations. |
mutrate |
an integer representing the mutation rate assumed for the simulations. |
mean |
an integer or a vector defining the mean value of the negative binomial distribution from which different number of reads are drawn. It represents the mean coverage across all sites. If a vector is supplied, the function assumes that each entry of the vector is the mean for a different population. |
variance |
an integer or a vector defining the variance of the negative binomial distribution from which different number of reads are drawn. It represents the variance of the total coverage across all sites. If a vector is supplied, the function assumes that each entry of the vector is the variance for a different population. |
minimum |
an integer representing the minimum coverage allowed. Sites where any population has a depth of coverage below this threshold are removed from the data. |
maximum |
an integer representing the maximum coverage allowed. Sites where any population has a depth of coverage above this threshold are removed from the data. |
size |
a list with one entry per population. Each entry should be a vector containing the size (in number of diploid individuals) of each pool. Thus, if a population was sequenced using a single pool, the vector should contain only one entry. If a population was sequenced using two pools, each with 10 individuals, this vector should contain two entries and both will be 10. |
min.minor |
is an integer representing the minimum allowed number of minor-allele reads. Sites that, across all populations, have less minor-allele reads than this threshold will be removed from the data. |
Details
This is done by simulating extra loci for each of the different types of simulations performed. The possible types of simulations include loci without barriers against migration between divergent ecotypes, loci without migration from the C towards the W ecotype, loci without migration from the W towards the C ecotypes and loci where no migration occurs between divergent ecotypes. Using this function, more loci than required are simulated for each of those types of simulations.
Then, a coverage-based filter is applied to the data, followed by a filter based on a required number of minor-allele reads per site. Those filters remove some loci from the data. The extra simulated loci should allow us to keep the required number of loci per type of simulation even after filtering.
Value
a list with two names entries
pool |
a list with three different entries: major, minor and total.
This list is obtained by running the |
nPoly |
a numeric value indicating the mean number of polymorphic sites across all simulated locus. |
Examples
# create a vector with parameter values for a two populations model
params <- createParams(Nref = c(25000, 25000), ratio = c(0.1, 3), pool = c(5, 250),
seq = c(0.0001, 0.001), split = c(0, 3), CW = c(1e-13, 1e-3), WC = c(1e-13, 1e-3),
bT = c(0, 0.2), model = "2pops")
# simulate exactly 10 loci - using an isolation with migration model with two populations
forceLocus(model = "2pops", parameters = params, nSites = 1000, nLoci = 10, nDip = 100,
mutrate = 2e-8, mean = c(100, 100), variance = c(250, 250), minimum = 10, maximum = 200,
size = list(50, 50), min.minor = 0)