R: Population genetics from genomic data

popgen {snpReady}

R Documentation

Population genetics from genomic data

Description

Allows for estimating parameters of population genetics from genomic data. Besides, it also allows the estimate of same parameters considering subpopulations.

Usage

popgen(M, subgroups, plot = FALSE)

Arguments

`M`	Object of class `matrix`. A (non-empty) matrix of molecular markers, considering the count of reference alleles per loci (0, 1 or 2). Markers must be in columns and individuals in rows. Missing data should be assigned as `NA`
`subgroups`	A `vector` with information for subgroups or subpopulations.
`plot`	If `TRUE`, a graphical output is produced. See `details`

Details

The number of subgroups is defined by the user and accepts any data type (character, integer ...) to distinguish subpopulations. These two inputs must have the same sort for rows (genotypes).

Value

Two-level lists are returned (whole and bygroup), one with general information for markers and individuals and another by subgroups (if applicable).

For whole, a list containing estimates parameters for

$Markers: For each marker it presents the allelic frequency (p and q), Minor Allele Frequency (MAF), expected heterozygosity (H_e), observed heterozygosity (H_o), Nei's Genetic Diversity (DG) and Polymorphism Informative Content(PIC), proportion of missing (Miss), \chi^2 statistic for the Hardy-Weinberg equilibrium test and its pvalue
$Genotypes: It presents observed heterozygosity (H_o) and coefficient of inbreeding (F_i) estimated as excess of homozygous relative to the expected (Keller et al. (2011))
$Population: The same parameters as those for markers except PIC are returned for general population along with lower and upper boundaries
$Variability: shows estimates of effective population size (Ne), additive (Va) and dominance (Vd) variances components, and a summary of number of groups, genotypes and markers

In the presence of subgroups, the same populational parameters are estimated considering each subpopulation accompanied by its exclusive and fixed alleles. Moreover, a list with the F-statistics (F_IT, F_IS and F_ST) for genotypes and markers are exhibited. For genotypes, it shows the statistics considering all subpopulations and a pairwise framework, and for markers loci, the parameters are presented only considering all subpopulations.

The plot produces a histogram for the estimates of MAF, GD, PIC and He considering the whole population and subpopulations, when it is available. Also, a heat map of the pairwise F_ST between populations is produced.

References

Weir, B.S. and C.C. Cockerham. (1984). Estimating F-Statistics for the Analysis of Population Structure. Evolution 38: 1358-1370. doi:10.2307/2408641.

Keller M.C., Visscher P.M., Goddard M.E. (2011) Quantification of inbreeding due to distant ancestors and its detection using dense single nucleotide polymorphism data. Genetics 189:237-249. doi: 10.1534/genetics.111.130922

Examples

# hybrid maize data
data(maize.hyb)
x <- popgen(maize.hyb) 

# using subpopulations
PS<-c(rep(1,25), rep(2,25))
x <- popgen(maize.hyb, subgroups=PS)

[Package snpReady version 0.9.6 Index]