gl.report.fstat {dartR.base}R Documentation

Reports various statistics of genetic differentiation between populations with confident intervals

Description

This function calculates four genetic differentiation between populations statistics (see the "Details" section for further information).

Sampling errors arise because allele frequencies in our samples differ from those in the subpopulations from which they were taken (Holsinger, 2012).

Confident Intervals are obtained using bootstrapping.

Usage

gl.report.fstat(
  x,
  nboots = 0,
  conf = 0.95,
  CI.type = "bca",
  ncpus = 1,
  plot.stat = "Fstp",
  plot.display = TRUE,
  palette.divergent = gl.colors("div"),
  font.size = 0.5,
  plot.dir = NULL,
  plot.file = NULL,
  verbose = NULL,
  ...
)

Arguments

x

Name of the genlight object containing the SNP data [required].

nboots

Number of bootstrap replicates to obtain confident intervals [default 0].

conf

The confidence level of the required interval [default 0.95].

CI.type

Method to estimate confident intervals. One of "norm", "basic", "perc" or "bca" [default "bca"].

ncpus

Number of processes to be used in parallel operation. If ncpus > 1 parallel operation is activated,see "Details" section [default 1].

plot.stat

Statistic to plot. One of "Fst","Fstp","Dest" or "Gst_H" [default "Fstp"].

plot.display

If TRUE, a heatmap of the pairwise static chosen is displayed in the plot window [default TRUE].

palette.divergent

A color palette function for the heatmap plot [default gl.colors("div")].

font.size

Size of font for the labels of horizontal and vertical axes of the heatmap [default 0.5].

plot.dir

Directory in which to save files [default working directory].

plot.file

Name for the RDS binary file to save (base name only, exclude extension) [default NULL].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default NULL, unless specified using gl.set.verbosity]

...

Parameters passed to function heatmap.2 (package gplots).

Details

Even though Fst and its relatives can predict evolutionary processes (Holsinger & Weir, 2009), they are not true measures of genetic differentiation in the sense that they are dependent on the diversity within populations (Meirmans & Hedrick, 2011), the number of populations analysed (Alcala & Rosenberg, 2017) and are not monotonic (Sherwin et al., 2017). Recent approaches have been developed to accommodate these mathematical restrictions (G'ST; "Gst_H"; Hedrick, 2005, and Jost's D; "Dest"; Jost, 2008). More recently, novel approaches based on information theory (Mutual Information; Sherwin et al., 2017) and allele frequencies (Allele Frequency Difference; Berner, 2019) have distinct properties that make them valuable resources to interpret genetic differentiation between populations.

Note that each measure of genetic differentiation has advantages and drawbacks, and the decision of using a particular measure is usually based on the research question.

Statistics calculated

The equations used to calculate the statistics are shown below.

Confident Intervals

The uncertainty of a parameter, in this case the mean of the statistic, can be summarised by a confidence interval (CI) which includes the true parameter value with a specified probability (i.e. confidence level; the parameter "conf" in this function).

In this function, CI are obtained using Bootstrap which is an inference method that samples with replacement the data (i.e. loci) and calculates the statistics every time.

This function uses the function boot (package boot) to perform the bootstrap replicates and the function boot.ci (package boot) to perform the calculations for the CI.

Four different types of nonparametric CI can be calculated (parameter "CI.type" in this function):

The studentized bootstrap interval ("stud") was not included in the CI types because it is computationally intensive, it may produce estimates outside the range of plausible values and it has been found to be erratic in practice, see for example the "Studentized (t) Intervals" section in:

https://www.r-bloggers.com/2019/09/understanding-bootstrap-confidence-interval-output-from-the-r-boot-package/

Nice tutorials about the different types of CI can be found in:

https://www.datacamp.com/tutorial/bootstrap-r

and

https://www.r-bloggers.com/2019/09/understanding-bootstrap-confidence-interval-output-from-the-r-boot-package/

Efron and Tibshirani (1993, p. 162) and Davison and Hinkley (1997, p. 194) suggest that the number of bootstrap replicates should be between 1000 and 2000.

It is important to note that unreliable confident intervals will be obtained if too few number of bootstrap replicates are used. Therefore, the function boot.ci will throw warnings and errors if bootstrap replicates are too few. Consider increasing then number of bootstrap replicates to at least 200.

The "bca" interval is often cited as the best for theoretical reasons, however it may produce unstable results if the bootstrap distribution is skewed or has extreme values. For example, you might get the warning "extreme order statistics used as endpoints" or the error "estimated adjustment 'a' is NA". In this case, you may want to use more bootstrap replicates or a different method or check your data for outliers.

The error "estimated adjustment 'w' is infinite" means that the estimated adjustment ‘w’ for the "bca" interval is infinite, which can happen when the empirical influence values are zero or very close to zero. This can be caused by various reasons, such as:

The number of bootstrap replicates is too small, the statistic of interest is constant or nearly constant across the bootstrap samples, the data contains outliers or extreme values.

You can try some possible solutions, such as:

Increasing the number of bootstrap replicates, using a different type of bootstrap confidence interval or removing or transforming the outliers or extreme values.

Plotting

The plot can be customised by including any parameter(s) from the function heatmap.2 (package gplots).

For the color palette you could try for example:

> library(viridis)

> res <- gl.report.fstat(platypus.gl, palette.divergent = viridis)

If a plot.file is given, the plot arising from this function is saved as an "RDS" binary file using the function saveRDS (package base); can be reloaded with function readRDS (package base). A file name must be specified for the plot to be saved.

If a plot directory (plot.dir) is specified, the gplot binary is saved to that directory; otherwise to the tempdir().

Your plot might not shown in full because your 'Plots' pane is too small (in RStudio). Increase the size of the 'Plots' pane before running the function. Alternatively, use the parameter 'plot.file' to save the plot to a file.

Parallelisation

If the parameter ncpus > 1, parallelisation is enabled. In Windows, parallel computing employs a "socket" approach that starts new copies of R on each core. POSIX systems, on the other hand (Mac, Linux, Unix, and BSD), utilise a "forking" approach that replicates the whole current version of R and transfers it to a new core.

Opening and terminating R sessions in each core involves a significant amount of processing time, therefore parallelisation in Windows machines is only quicker than not usung parallelisation when nboots > 1000-2000.

Value

Two lists, the first list contains matrices with genetic statistics taken pairwise by population, the second list contains tables with the genetic statistics for each pair of populations. If nboots > 0, tables with the four statistics calculated with Low Confidence Intervals (LCI) and High Confidence Intervals (HCI).

Author(s)

Custodian: Luis Mijangos – Post to https://groups.google.com/d/forum/dartr

References

See Also

Other matched reports: gl.mahal.assign(), gl.report.bases(), gl.report.factorloadings(), gl.report.monomorphs()

Examples

res <- gl.report.fstat(platypus.gl)


[Package dartR.base version 0.65 Index]