R: Genotype Probability

pgen {poppr}

R Documentation

Genotype Probability

Description

Calculate the probability of genotypes based on the product of allele frequencies over all loci.

Usage

pgen(gid, pop = NULL, by_pop = TRUE, log = TRUE, freq = NULL, ...)

Arguments

`gid`	a genind or genclone object.
`pop`	either a formula to set the population factor from the `strata` slot or a vector specifying the population factor for each sample. Defaults to `NULL`.
`by_pop`	When this is `TRUE` (default), the calculation will be done by population.
`log`	a `logical` if `log =TRUE` (default), the values returned will be log(Pgen). If `log = FALSE`, the values returned will be Pgen.
`freq`	a vector or matrix of allele frequencies. This defaults to `NULL`, indicating that the frequencies will be determined via round-robin approach in `rraf`. If this matrix or vector is not provided, zero-value allele frequencies will automatically be corrected. For details, please see the documentation on correcting rare alleles.
`...`	options from correcting rare alleles. The default is to correct allele frequencies to 1/n

Details

Pgen is the probability of a given genotype occuring in a population assuming HWE. Thus, the value for diploids is

P_{gen} = \left(\prod_{i=1}^m p_i\right)2^h

where p_i are the allele frequencies and h is the count of the number of heterozygous sites in the sample (Arnaud-Haond et al. 2007; Parks and Werth, 1993). The allele frequencies, by default, are calculated using a round-robin approach where allele frequencies at a particular locus are calculated on the clone-censored genotypes without that locus.

To avoid issues with numerical precision of small numbers, this function calculates pgen per locus by adding up log-transformed values of allele frequencies. These can easily be transformed to return the true value (see examples).

Value

A vector containing Pgen values per locus for each genotype in the object.

Note

For haploids, Pgen at a particular locus is the allele frequency. This function cannot handle polyploids. Additionally, when the argument pop is not NULL, by_pop is automatically TRUE.

Author(s)

Zhian N. Kamvar, Jonah Brooks, Stacy A. Krueger-Hadfield, Erik Sotka

References

Arnaud-Haond, S., Duarte, C. M., Alberto, F., & Serrão, E. A. 2007. Standardizing methods to address clonality in population studies. Molecular Ecology, 16(24), 5115-5139.

Parks, J. C., & Werth, C. R. 1993. A study of spatial features of clones in a population of bracken fern, Pteridium aquilinum (Dennstaedtiaceae). American Journal of Botany, 537-544.

Examples

data(Pram)
head(pgen(Pram, log = FALSE))

## Not run: 
# You can also supply the observed allele frequencies
pramfreq <- Pram %>% genind2genpop() %>% tab(freq = TRUE)
head(pgen(Pram, log = FALSE, freq = pramfreq))

# You can get the Pgen values over all loci by summing over the logged results:
pgen(Pram, log = TRUE) %>%  # calculate pgen matrix
  rowSums(na.rm = TRUE) %>% # take the sum of each row
  exp()                     # take the exponent of the results

# You can also take the product of the non-logged results:
apply(pgen(Pram, log = FALSE), 1, prod, na.rm = TRUE)

## Rare Allele Correction ---------------------------------------------------
##
# If you don't supply a table of frequencies, they are calculated with rraf 
# with correction = TRUE. This is normally benign when analyzing large 
# populations, but it can have a great effect on small populations. To help 
# control this, you can supply arguments described in 
# help("rare_allele_correction"). 


# Default is to correct by 1/n per population. Since the calculation is 
# performed on a smaller sample size due to round robin clone correction, it
# would be more appropriate to correct by 1/rrmlg at each locus. This is 
# acheived by setting d = "rrmlg". Since this is a diploid, we would want to
# account for the number of chromosomes, and so we set mul = 1/2
head(pgen(Pram, log = FALSE, d = "rrmlg", mul = 1/2)) # compare with the output above

# If you wanted to treat all alleles as equally rare, then you would set a
# specific value (let's say the rare alleles are 1/100):
head(pgen(Pram, log = FALSE, e = 1/100))

## End(Not run)

[Package poppr version 2.9.6 Index]