calcPopDiff {polysat} | R Documentation |
Estimate Population Differentiation Statistics
Description
Given a data frame of allele frequencies and population sizes,
calcPopDiff
calculates a matrix of pairwise F_{ST}
,
G_{ST}
, Jost's D
, or R_{ST}
values, or a single global
value for any of these four statistics.
calcFst
is a wrapper for calcPopDiff
to allow backwards
compatibility with previous versions of polysat.
Usage
calcPopDiff(freqs, metric, pops = row.names(freqs),
loci = unique(gsub("\\..*$", "", names(freqs))), global = FALSE,
bootstrap = FALSE, n.bootstraps = 1000, object = NULL)
calcFst(freqs, pops = row.names(freqs),
loci = unique(gsub("\\..*$", "", names(freqs))), ...)
Arguments
freqs |
A data frame of allele frequencies and population sizes such as that
produced by |
metric |
The population differentiation statistic to estimate. Can be “Fst”, “Gst”, “Jost's D”, or “Rst”. |
pops |
A character vector. Populations to analyze, which should be
a subset of |
loci |
A character vector indicating which loci to analyze. These should be a
subset of the locus names as used in the column names of |
global |
Boolean. If |
bootstrap |
Boolean. If |
n.bootstraps |
Integer. The number of bootstrap replicates to perform. Ignored if
|
object |
A |
... |
Additional arguments to be passed to |
Details
For metric = "Fst"
or calcFst
:
H_S
and H_T
are estimate directly from allele frequencies
for each locus for each pair of populations, then averaged across loci.
Wright's F_{ST}
is then calculated for each pair of populations as
\frac{H_T - H_S}{H_T}
.
H
values (expected heterozygosities for populations and combined
populations) are calculated as one minus the sum of all squared allele
frequencies at a locus. To calculte H_T
, allele frequencies between two
populations are averaged before the calculation. To calculate H_S
, H
values are averaged after the calculation. In both cases, the averages
are weighted by the relative sizes of the two populations (as indicated
by freqs$Genomes
).
For metric = "Gst"
:
This metric is similar to F_{ST}
, but mean allele frequencies and
mean expected heterozygosities are not weighted by population size. Additionally,
unbiased estimators of H_S
and H_T
are used according to Nei
and Chesser (1983; equations 15 and 16, reproduced also in Jost (2008)).
Instead of using twice the harmonic mean of the number of individuals in the two
subpopulations as described by Nei and Chesser (1983), the harmonic mean of the
number of allele copies in the two subpopulations (taken from freq$Genomes
)
is used, in order to allow for polyploidy.
G_{ST}
is estimated for each locus and then averaged across loci.
For metric = "Jost's D"
:
The unbiased estimators of H_S
and H_T
and calculated as
with G_{ST}
, without weighting by population size. They are then used
to estimate D
at each locus according to Jost (2008; equation 12):
2 * \frac{H_T - H_S}{1 - H_S}
Values of D
are then averaged across loci.
For metric = "Rst"
:
R_{ST}
is estimated on a per-locus basis according to Slatkin (1995),
but with populations weighted equally regardless of size. Values are then averaged
across loci.
For each locus:
S_w = \frac{1}{d} * \sum_j^d{\frac{\sum_i{\sum_{i' < i}{p_{ij} * p_{i'j} * n_j^2 * (a_i - a_{i'})^2}}}{n_j * (n_j - 1)}}
\bar{S} = \frac{\sum_i{\sum_{i' < i}{\bar{p}_i * \bar{p}_{i'} * n^2 * (a_i - a_{i'})^2}}}{n * (n-1)}
R_{ST} = \frac{\bar{S} - S_w}{\bar{S}}
where d
is the number of populations, j
is an individual population, i
and
i'
are individual alleles, p_{ij}
is the frequency of an allele in a population,
n_j
is the number of allele copies in a population, a_i
is the size of an allele
expressed in number of repeat units, \bar{p}_i
is an allele frequency averaged across
populations (with populations weighted equally), and n
is the total number of allele copies
across all populations.
Value
If global = FALSE
and bootstrap = FALSE
, a square matrix containing F_{ST}
, G_{ST}
, D
, or R_{ST}
values.
The rows and columns of the matrix are both named by population.
If global = TRUE
and bootstrap = FALSE
, a single value indicating the F_{ST}
, G_{ST}
, D
, or R_{ST}
value.
If global = TRUE
and bootstrap = TRUE
, a vector of bootstrapped replicates of F_{ST}
, G_{ST}
, D
, or R_{ST}
.
If global = FALSE
and bootstrap = TRUE
, a square two-dimensional list, with rows and columns named by population. Each item is
a vector of bootstrapped values for F_{ST}
, G_{ST}
, D
, or R_{ST}
for that pair of populations.
Author(s)
Lindsay V. Clark
References
Nei, M. (1973) Analysis of gene diversity in subdivided populations. Proceedings of the National Academy of Sciences of the United States of America 70, 3321–3323.
Nei, M. and Chesser, R. (1983) Estimation of fixation indices and gene diversities. Annals of Human Genetics 47, 253–259.
Jost, L. (2008) G_{ST}
and its relatives to not measure differentiation.
Molecular Ecology 17, 4015–4026.
Slatkin, M. (1995) A measure of population subdivision based on microsatellite allele frequencies. Genetics 139, 457–462.
See Also
Examples
# create a data set (typically done by reading files)
mygenotypes <- new("genambig", samples = paste("ind", 1:6, sep=""),
loci = c("loc1", "loc2"))
Genotypes(mygenotypes, loci = "loc1") <- list(c(206), c(208,210),
c(204,206,210),
c(196,198,202,208), c(196,200), c(198,200,202,204))
Genotypes(mygenotypes, loci = "loc2") <- list(c(130,134), c(138,140),
c(130,136,140),
c(138), c(136,140), c(130,132,136))
PopInfo(mygenotypes) <- c(1,1,1,2,2,2)
mygenotypes <- reformatPloidies(mygenotypes, output="sample")
Ploidies(mygenotypes) <- c(2,2,4,4,2,4)
Usatnts(mygenotypes) <- c(2,2)
# calculate allele frequencies
myfreq <- simpleFreq(mygenotypes)
# calculate pairwise differentiation statistics
myfst <- calcPopDiff(myfreq, metric = "Fst")
mygst <- calcPopDiff(myfreq, metric = "Gst")
myD <- calcPopDiff(myfreq, metric = "Jost's D")
myrst <- calcPopDiff(myfreq, metric = "Rst", object = mygenotypes)
# examine the results
myfst
mygst
myD
myrst
# get global statistics
calcPopDiff(myfreq, metric = "Fst", global = TRUE)
calcPopDiff(myfreq, metric = "Gst", global = TRUE)
calcPopDiff(myfreq, metric = "Jost's D", global = TRUE)
calcPopDiff(myfreq, metric = "Rst", global = TRUE, object = mygenotypes)