gl.report.pa {dartR.base} | R Documentation |
Reports private alleles (and fixed alleles) per pair of populations
Description
This function reports private alleles in one population compared with a second population, for all populations taken pairwise. It also reports a count of fixed allelic differences and the mean absolute allele frequency differences (AFD) between pairs of populations.
Usage
gl.report.pa(
x,
x2 = NULL,
method = "pairwise",
loc.names = FALSE,
test.asym = FALSE,
test.asym.boot = 100,
plot.display = FALSE,
plot.font = 14,
map.interactive = FALSE,
provider = "Esri.NatGeoWorldMap",
palette.discrete = NULL,
plot.file = NULL,
plot.dir = NULL,
verbose = NULL
)
Arguments
x |
Name of the genlight object containing the SNP data [required]. |
x2 |
If two separate genlight objects are to be compared this can be provided here, but they must have the same number of SNPs [default NULL]. |
method |
Method to calculate private alleles: 'pairwise' comparison or compare each population against the rest 'one2rest' [default 'pairwise']. |
loc.names |
Whether names of loci with private alleles and fixed differences should reported. If TRUE, loci names are reported using a list |
test.asym |
bootstrap test for significant differences of private alleles. This test uses a bootstrap simulation by shuffling individuals between a pair of population and drawing with replacement. For each bootstrap the ratio of private alleles is compared to the actual ratio and recorded how often it is larger than the simulated one. If number of individuals are different between population bootstrap is done using the smaller number of samples in both populations. |
test.asym.boot |
number of bootstraps [default 100] [default FALSE]. |
plot.display |
Specify if Sankey plot is to be produced [default FALSE]. |
plot.font |
Numeric font size in pixels for the node text labels [default 14]. |
map.interactive |
Specify whether an interactive map showing private alleles between populations is to be produced [default FALSE]. |
provider |
Passed to leaflet [default "Esri.NatGeoWorldMap"]. |
palette.discrete |
A discrete palette for the color of populations or a list with as many colors as there are populations in the dataset [default gl.select.colors(x)]. |
plot.file |
Name for the RDS binary file to save (base name only, exclude extension) [default NULL] |
plot.dir |
Directory in which to save files [default = working directory] |
verbose |
Verbosity: 0, silent, fatal errors only; 1, flag function begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity]. |
Details
Note that the number of paired alleles between two populations is not a
symmetric dissimilarity measure.
If no x2 is provided, the function uses the pop(gl) hierarchy to determine
pairs of populations, otherwise it runs a single comparison between x and
x2.
Hint: in case you want to run comparisons between individuals
(assuming individual names are unique), you can simply redefine your
population names with your individual names, as below:
pop(gl) <- indNames(gl)
Definition of fixed and private alleles
The table below shows the possible cases of allele frequencies between
two populations (0 = homozygote for Allele 1, x = both Alleles are present,
1 = homozygote for Allele 2).
p: cases where there is a private allele in pop1 compared to pop2 (but not vice versa)
f: cases where there is a fixed allele in pop1 (and pop2, as those cases are symmetric)
pop1 | ||||
0 | x | 1 | ||
0 | - | p | p,f | |
pop2 | x | - | - | - |
1 | p,f | p | - | |
The absolute allele frequency difference (AFD) in this function is a simple differentiation metric displaying intuitive properties which provides a valuable alternative to FST. For details about its properties and how it is calculated see Berner (2019). The function also reports an estimation of the lower bound of the number of undetected private alleles using the Good-Turing frequency formula, originally developed for cryptography, which estimates in an ecological context the true frequencies of rare species in a single assemblage based on an incomplete sample of individuals. The approach is described in Chao et al. (2017). For this function, the equation 2c is used. This estimate is reported in the output table as Chao1 and Chao2. In this function a Sankey Diagram is used to visualize patterns of private alleles between populations. This diagram allows to display flows (private alleles) between nodes (populations). Their links are represented with arcs that have a width proportional to the importance of the flow (number of private alleles). if save2temp=TRUE, resultant plot(s) and the tabulation(s) are saved to the session's temporary directory.
Value
A data.frame. Each row shows, for each pair of populations the number of individuals in each population, the number of loci with fixed differences (same for both populations) in pop1 (compared to pop2) and vice versa. Same for private alleles and finally the absolute mean allele frequency difference between loci (AFD). If loc.names = TRUE, loci names with private alleles and fixed differences are reported in a list in addition to the dataframe.
Author(s)
Custodian: Bernd Gruber – Post to https://groups.google.com/d/forum/dartr
References
Berner, D. (2019). Allele frequency difference AFD – an intuitive alternative to FST for quantifying genetic population differentiation. Genes, 10(4), 308.
Chao, Anne, et al. "Deciphering the enigma of undetected species, phylogenetic, and functional diversity based on Good-Turing theory." Ecology 98.11 (2017): 2914-2929.
See Also
Other matched report:
gl.report.callrate()
,
gl.report.hamming()
,
gl.report.locmetric()
,
gl.report.maf()
,
gl.report.overshoot()
,
gl.report.rdepth()
,
gl.report.reproducibility()
,
gl.report.secondaries()
,
gl.report.taglength()
Examples
out <- gl.report.pa(platypus.gl)
out <- gl.report.pa(platypus.gl)