maeFreqs {poolHelper} | R Documentation |
Average absolute difference between allele frequencies computed from genotypes and from Pool-seq data
Description
Calculates the average absolute difference between the allele frequencies computed directly from genotypes and from pooled sequencing data.
Usage
maeFreqs(
nDip,
nloci,
pError,
sError,
mCov,
vCov,
min.minor,
minimum = NA,
maximum = NA,
theta = 10
)
Arguments
nDip |
is an integer or a vector representing the total number of
diploid individuals to simulate. Note that |
nloci |
is an integer that represents how many independent loci should be simulated. |
pError |
an integer or a vector representing the value of the error associated with DNA pooling. This value is related with the unequal contribution of both individuals and pools towards the total number of reads observed for a given population - the higher the value the more unequal are the individual and pool contributions. If it is a vector, then each vector entry will be simulated independently. |
sError |
a numeric value with error rate associated with the sequencing and mapping process. This error rate is assumed to be symmetric: error(reference -> alternative) = error(alternative -> reference). This number should be between 0 and 1. |
mCov |
an integer or a vector that defines the mean depth of coverage to simulate. Please note that this represents the mean coverage across all sites. If it is a vector, then each vector entry will be simulated independently. |
vCov |
an integer or a vector that defines the variance of the depth of
coverage across all sites. If the |
min.minor |
is an integer representing the minimum allowed number of minor-allele reads. Sites that, across all populations, have less minor-allele reads than this threshold will be removed from the data. |
minimum |
an optional integer representing the minimum coverage allowed. Sites where the population has a depth of coverage below this threshold are removed from the data. |
maximum |
an optional integer representing the maximum coverage allowed. Sites where the population has a depth of coverage above this threshold are removed from the data. |
theta |
a value for the mutation rate assuming theta = 4Nu, where u is the neutral mutation rate per locus. |
Details
The average absolute difference is computed with the mae
function, assuming the frequencies computed directly from the genotypes as
the actual
input argument and the frequencies from pooled data as the
predicted
input argument.
Note that this functions allows for different combinations of parameters.
Thus, the effect of different combinations of parameters on the average
absolute difference can be tested. For instance, it is possible to check what
is the effect of different coverages by including more than one value in the
mCov
input argument. This function will run and compute the average
absolute difference for all combinations of the nDip
, pError
and mCov
input arguments. This function assumes that a single pool of
size nDip
was used to sequence the population.
Value
a data.frame with columns detailing the number of diploid individuals, the pool error, the number of pools, the number of individuals per pool, the mean coverage, the variance of the coverage and the average absolute difference between the frequencies computed from genotypes and from pooled data.
Examples
# a simple test with a simple combination of parameters
maeFreqs(nDip = 100, nloci = 10, pError = 100, sError = 0.01, mCov = 100, vCov = 200, min.minor = 1)
# effect of two different pool error values in conjugation with a fixed coverage and pool size
maeFreqs(nDip = 100, nloci = 10, pError = c(100, 200), sError = 0.01,
mCov = 100, vCov = 200, min.minor = 1)
# effect of two different pool error values in conjugation with a fixed pool size
# and two different coverages
maeFreqs(nDip = 100, nloci = 10, pError = c(100, 200), sError = 0.01,
mCov = c(100, 200), vCov = c(200, 500), min.minor = 1)