ExpectedHindHe {polyRAD} | R Documentation |
Simulate Data to Get Expected Distribution of Hind/He
Description
These functions were created to help users determine an appropriate cutoff for
filtering loci based on H_{ind}/H_E
after running
HindHe
and InbreedingFromHindHe
.
ExpectedHindHe
takes allele frequencies, sample size, and read depths from
a RADdata
object, simulates genotypes and allelic read depths from
these assuming Mendelian inheritance, and then estimates
H_{ind}/H_E
for each simulated locus.
ExpectedHindHeMapping
performs similar simulation and estimation, but
in mapping populations based on parental genotypes and expected distribution
of progeny genotypes.
SimGenotypes
, SimGenotypesMapping
, and
SimAlleleDepth
are internal functions used by ExpectedHindHe
and ExpectedHindHeMapping
but are provided at the user level since they may be more broadly useful.
Usage
ExpectedHindHe(object, ploidy = object$possiblePloidies[[1]], inbreeding = 0,
overdispersion = 20, contamRate = 0, errorRate = 0.001,
reps = ceiling(5000/nLoci(object)),
quiet = FALSE, plot = TRUE)
ExpectedHindHeMapping(object, ploidy = object$possiblePloidies[[1]],
n.gen.backcrossing = 0, n.gen.selfing = 0,
overdispersion = 20, contamRate = 0, errorRate = 0.001,
freqAllowedDeviation = 0.05,
minLikelihoodRatio = 10, reps = ceiling(5000/nLoci(object)),
quiet = FALSE, plot = TRUE)
SimGenotypes(alleleFreq, alleles2loc, nsam, inbreeding, ploidy)
SimGenotypesMapping(donorGen, recurGen, alleles2loc, nsam,
ploidy.don, ploidy.rec,
n.gen.backcrossing, n.gen.selfing)
SimAlleleDepth(locDepth, genotypes, alleles2loc, overdispersion = 20,
contamRate = 0, errorRate = 0.001)
Arguments
object |
A |
ploidy |
A single integer indicating the ploidy to use for genotype simulation.
For |
inbreeding |
A number ranging from 0 to 1 indicating the amount of inbreeding ( |
overdispersion |
Overdispersion parameter as described in |
contamRate |
Sample cross-contamination rate to simulate. Although 0 is the default, 0.001 is also reasonable. |
errorRate |
Sequencing error rate to simulate. For Illumina reads, 0.001 is a reasonable value. An error is assumed to have an equal chance of converting an allele to any other allele at the locus, although this is somewhat of an oversimplification. |
reps |
The number of times to simulate the data and estimate |
quiet |
Boolean indicating whether to suppress messages and results printed to console. |
plot |
Boolean indicating whether to plot a histogram of |
n.gen.backcrossing |
An integer indicating the number of generations of backcrossing. |
n.gen.selfing |
An integer indicating the number of generations of self-fertilization. |
freqAllowedDeviation |
The amount by which allele frequencies are allowed to deviate from expected
allele frequencies. See |
minLikelihoodRatio |
Minimum likelihood ratio for determining the most likely parental genotypes.
See |
alleleFreq |
A vector of allele frequencies, as can be found in the |
alleles2loc |
An integer vector assigning alleles to loci, as can be found in the
|
nsam |
An integer indicating the number of samples (number of taxa) to simulate. |
donorGen |
A vector indicating genotypes of the donor parent (which can be either parent if backcrossing was not performed), with one value for each allele in the dataset, and numbers indicating the copy number of each allele. |
recurGen |
A vector indicating genotypes of the recurrent parent, as with |
ploidy.don |
A single integer indicating the ploidy of the donor parent. |
ploidy.rec |
A single integer indicating the ploidy of the recurrent parent. |
locDepth |
An integer matrix indicating read depth at each taxon and locus. Formatted as
the |
genotypes |
A numeric matrix, formatted as the output of |
Details
To prevent highly inflated values in the output, ExpectedHindHe
filters
loci with minor allele frequencies below five times the sequencing error rate.
Value
ExpectedHindHe
and ExpectedHindHeMapping
invisibly return a
matrix, with loci in rows and reps in
columns, containing H_{ind}/H_E
from the simulated loci.
SimGenotypes
and SimGenotypesMapping
return a numeric matrix of
allele copy number, with samples
in rows and alleles in columns, similar to that produced by
GetProbableGenotypes
.
SimAlleleDepth
returns an integer matrix of allelic read depth, with
samples in rows and alleles in columns, similar to the $alleleDepth
slot of a RADdata
object.
Author(s)
Lindsay V. Clark
References
Clark, L. V., Mays, W., Lipka, A. E. and Sacks, E. J. (2022) A population-level statistic for assessing Mendelian behavior of genotyping-by-sequencing data from highly duplicated genomes. BMC Bioinformatics 23, 101, doi:10.1186/s12859-022-04635-9.
Examples
# Load dataset for the example
data(exampleRAD)
exampleRAD <- AddAlleleFreqHWE(exampleRAD)
# Simulate genotypes
simgeno <- SimGenotypes(exampleRAD$alleleFreq, exampleRAD$alleles2loc, 10, 0.2, 2)
# Simulate reads
simreads <- SimAlleleDepth(exampleRAD$locDepth[1:10,], simgeno, exampleRAD$alleles2loc)
# Get expected Hind/He distribution if all loci in exampleRAD were well-behaved
ExpectedHindHe(exampleRAD, reps = 10)
# Mapping population example
data(exampleRAD_mapping)
exampleRAD_mapping <- SetDonorParent(exampleRAD_mapping, "parent1")
exampleRAD_mapping <- SetRecurrentParent(exampleRAD_mapping, "parent2")
exampleRAD_mapping <- AddAlleleFreqMapping(exampleRAD_mapping,
expectedFreqs = c(0.25, 0.75),
allowedDeviation = 0.08)
exampleRAD_mapping <- AddGenotypeLikelihood(exampleRAD_mapping)
exampleRAD_mapping <- EstimateParentalGenotypes(exampleRAD_mapping,
n.gen.backcrossing = 1)
simgenomap <- SimGenotypesMapping(exampleRAD_mapping$likelyGeno_donor[1,],
exampleRAD_mapping$likelyGeno_recurrent[1,],
exampleRAD_mapping$alleles2loc,
nsam = 10, ploidy.don = 2, ploidy.rec = 2,
n.gen.backcrossing = 1,
n.gen.selfing = 0)
ExpectedHindHeMapping(exampleRAD_mapping, n.gen.backcrossing = 1, reps = 10)