calc_sfs_tests {rehh} | R Documentation |
Calculate site frequency spectrum test statistics
Description
Calculate site frequency spectrum (SFS) tests Tajima's D, Fay & Wu's H and Zeng's E.
Usage
calc_sfs_tests(
haplohh,
polarized = TRUE,
window_size = NA,
overlap = 0,
right = TRUE,
min_n_mrk = 1,
verbose = TRUE
)
Arguments
haplohh |
an object of class |
polarized |
logical. |
window_size |
size of sliding windows. If |
overlap |
size of window overlap (default 0, i.e. no overlap). |
right |
logical, indicating if the windows should be closed on the right and open on the left (default) or vice versa. |
min_n_mrk |
minimum number of (polymorphic) markers per window. |
verbose |
logical. |
Details
Neutrality tests based on the site frequency spectrum (SFS) are largely unrelated to EHH-based methods. The tests provided here are implemented elsewhere, too (e.g. in package PopGenome).
Each test compares two estimations of the scaled mutation rate theta, which all have the same expected value under neutrality. Deviations from zero indicate violations of the neutral null model, typically population size changes, population subdivision or selection. Tajima's D and Fay & Wu's H become negative in presence of an almost completed sweep, Zeng's E becomes positive for some time after it. Significance can typically be assigned only by simulations.
The standard definition of the tests cannot cope with missing values and typically markers with missing genotypes must be discarded. Ferretti (2012) provides an extension that can handle missing values (without discarding any non-missing values). In this package, only the first moments (the theta-estimators themselves) are adapted accordingly, but not the second moments (their variances), because the latter is computationally demanding and the resulting bias relatively small. It is recommended, though, to discard markers or haplotypes with more than 20% missing values.
Multi-allelic markers are always removed since the tests rely on the "infinite sites model" which implies that all polymorphic markers are bi-allelic. Monomorphic markers can be present, but are irrelevant for the tests.
Value
A data frame with window coordinates, the number of contained (polymorphic) markers, Watterson's, Tajima's and Zeng's estimators of theta and the test statistics of Tajima's D, Fay & Wu's H and Zeng's E.
References
Watterson, G.A. (1975). On the number of segregating sites in genetical models without recombination. Theoretical Population Biology 7(2) 256-276.
Tajima, F. (1983). Evolutionary relationship of DNA sequences in finite populations. Genetics 105(2) 437-60.
Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123(3) 585-95.
Fay, J. and Wu, C. (2000). Hitchhiking under positive Darwinian selection. Genetics 155(3) 1405-13.
Zeng, E. et al. (2006). Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics 174(3) 1431-9.
Ferretti, L. and Raineri, E. and Ramos-Onsins, S. (2012). Neutrality tests for sequences with missing data. Genetics 191(4) 1397-401.
Examples
make.example.files()
# neutral evolution
hh <- data2haplohh("example_neutral.vcf", verbose = FALSE)
calc_sfs_tests(hh)
# strong selective sweep
hh <- data2haplohh("example_sweep.vcf", verbose = FALSE)
calc_sfs_tests(hh)
remove.example.files()