R: Estimate Genotype Likelihoods in a RADdata object

AddGenotypeLikelihood {polyRAD}

R Documentation

Estimate Genotype Likelihoods in a RADdata object

Description

For each possible allele copy number across each possible ploidy in each taxon, AddGenotypeLikelihood estimates the probability of observing the distribution of read counts that are recorded for that taxon and locus. AddDepthSamplingPermutations is called by AddGenotypeLikelihood the first time it is run, so that part of the likelihood calcluation is stored in the RADdata object and doesn't need to be re-run on each iteration of the pipeline functions.

Usage

AddGenotypeLikelihood(object, ...)

## S3 method for class 'RADdata'
AddGenotypeLikelihood(object, overdispersion = 9, ...)

AddDepthSamplingPermutations(object, ...)

Arguments

`object`	A `"RADdata"` object.
`overdispersion`	An overdispersion parameter. Higher values will cause the expected read depth distribution to more resemble the binomial distribution. Lower values indicate more overdispersion, i.e. sample-to-sample variance in the probability of observing reads from a given allele.
`...`	Other arguments; none are currently used.

Details

If allele frequencies are not already recorded in object, they will be added using AddAlleleFreqHWE. Allele frequencies are then used for estimating the probability of sampling an allele from a genotype due to sample contamination. Given a known genotype with x copies of allele i, ploidy k, allele frequency p_i in the population used for making sequencing libraries, and contamination rate c, the probabiity of sampling a read r_i from that locus corresponding to that allele is

P(r_i | x) = \frac{x}{k} * (1 - c) + p_i * c

To estimate the genotype likelihood, where nr_i is the number of reads corresponding to allele i for a given taxon and locus and nr_j is the number of reads corresponding to all other alleles for that taxon and locus:

P(nr_i, nr_j | x) = {{nr_i + nr_j}\choose{nr_i}} * \frac{B[P(r_i | x) * d + nr_i, [1 - P(r_i | x)] * d + nr_j]]}{B[P(r_i | x) * d, [1 - P(r_i | x)] * d]}

where

{{nr_i + nr_j}\choose{nr_i}} = \frac{(nr_i + nr_j)!}{nr_i! * nr_j!}

B is the beta function, and d is the overdispersion parameter set by overdispersion. {{nr_i + nr_j}\choose{nr_i}} is calculated by AddDepthSamplingPermutations.

Value

A "RADdata" object identical to that passed to the function, but with genotype likelihoods stored in object$genotypeLikelihood. This item is a two dimensional list, with one row for each ploidy listed in object$possiblePloidies, ignoring differences between autopolyploids and allopolyploids, and one column for each ploidy listed in object$taxaPloidy. Each item in the list is a three-dimensional array with number of allele copies in the first dimension, taxa in the second dimension, and alleles in the third dimension.

Author(s)

Lindsay V. Clark

Examples

# load example dataset and add allele frequency
data(exampleRAD)
exampleRAD <- AddAlleleFreqHWE(exampleRAD)

# estimate genotype likelihoods
exampleRAD <- AddGenotypeLikelihood(exampleRAD)

# inspect the results
# the first ten individuals and first two alleles, assuming diploidy
exampleRAD$alleleDepth[1:10,1:2]
exampleRAD$genotypeLikelihood[[1]][,1:10,1:2]
# assuming tetraploidy
exampleRAD$genotypeLikelihood[[2]][,1:10,1:2]

[Package polyRAD version 2.0.0 Index]