R: Reports observed, expected and unbiased heterozygosities and...

gl.report.heterozygosity {dartR}

R Documentation

Reports observed, expected and unbiased heterozygosities and FIS (inbreeding coefficient) by population or by individual from SNP data

Description

Calculates the observed, expected and unbiased expected (i.e. corrected for sample size) heterozygosities and FIS (inbreeding coefficient) for each population or the observed heterozygosity for each individual in a genlight object.

Usage

gl.report.heterozygosity(
  x,
  method = "pop",
  n.invariant = 0,
  plot.out = TRUE,
  plot_theme = theme_dartR(),
  plot_colors_pop = discrete_palette,
  plot_colors_ind = two_colors,
  save2tmp = FALSE,
  verbose = NULL
)

Arguments

`x`	Name of the genlight object containing the SNP [required].
`method`	Calculate heterozygosity by population (method='pop') or by individual (method='ind') [default 'pop'].
`n.invariant`	An estimate of the number of invariant sequence tags used to adjust the heterozygosity rate [default 0].
`plot.out`	Specify if plot is to be produced [default TRUE].
`plot_theme`	Theme for the plot. See Details for options [default theme_dartR()].
`plot_colors_pop`	A color palette for population plots or a list with as many colors as there are populations in the dataset [default discrete_palette].
`plot_colors_ind`	List of two color names for the borders and fill of the plot by individual [default two_colors].
`save2tmp`	If TRUE, saves any ggplots and listings to the session temporary directory (tempdir) [default FALSE].
`verbose`	Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default NULL, unless specified using gl.set.verbosity].

Details

Observed heterozygosity for a population takes the proportion of heterozygous loci for each individual then averages over the individuals in that population. The calculations take into account missing values.

Expected heterozygosity for a population takes the expected proportion of heterozygotes, that is, expected under Hardy-Weinberg equilibrium, for each locus, then averages this across the loci for an average estimate for the population.

Observed heterozygosity for individuals is calculated as the proportion of loci that are heterozygous for that individual.

Finally, the loci that are invariant across all individuals in the dataset (that is, across populations), is typically unknown. This can render estimates of heterozygosity analysis specific, and so it is not valid to compare such estimates across species or even across different analyses. This is a similar problem faced by microsatellites. If you have an estimate of the number of invariant sequence tags (loci) in your data, such as provided by gl.report.secondaries, you can specify it with the n.invariant parameter to standardize your estimates of heterozygosity.

NOTE: It is important to realise that estimation of adjusted heterozygosity requires that secondaries not to be removed.

Heterozygosities and FIS (inbreeding coefficient) are calculated by locus within each population using the following equations:

Observed heterozygosity (Ho) = number of homozygotes / n_Ind, where n_Ind is the number of individuals without missing data.
Observed heterozygosity adjusted (Ho.adj) <- Ho * n_Loc / (n_Loc + n.invariant), where n_Loc is the number of loci that do not have all missing data and n.invariant is an estimate of the number of invariant loci to adjust heterozygosity.
Expected heterozygosity (He) = 1 - (p^2 + q^2), where p is the frequency of the reference allele and q is the frequency of the alternative allele.
Expected heterozygosity adjusted (He.adj) = He * n_Loc / (n_Loc + n.invariant)
Unbiased expected heterozygosity (uHe) = He * (2 * n_Ind / (2 * n_Ind - 1))
Inbreeding coefficient (FIS) = 1 - (mean(Ho) / mean(uHe))

Function's output

Output for method='pop' is an ordered barchart of observed heterozygosity, unbiased expected heterozygosity and FIS (Inbreeding coefficient) across populations together with a table of mean observed and expected heterozygosities and FIS by population and their respective standard deviations (SD).

In the output, it is also reported by population: the number of loci used to estimate heterozygosity(nLoc), the number of polymorphic loci (polyLoc), the number of monomorphic loci (monoLoc) and loci with all missing data (all_NALoc).

Output for method='ind' is a histogram and a boxplot of heterozygosity across individuals.

Plots and table are saved to the session temporary directory (tempdir)

Examples of other themes that can be used can be consulted in

Value

A dataframe containing population labels, heterozygosities, FIS, their standard deviations and sample sizes

Author(s)

Custodian: Luis Mijangos (Post to https://groups.google.com/d/forum/dartr)

Examples

 
require("dartR.data")
df <- gl.report.heterozygosity(platypus.gl)
df <- gl.report.heterozygosity(platypus.gl,method='ind')
n.inv <- gl.report.secondaries(platypus.gl)
gl.report.heterozygosity(platypus.gl, n.invariant = n.inv[7, 2])

df <- gl.report.heterozygosity(platypus.gl)

[Package dartR version 2.9.7 Index]