gl.run.epos {dartR.popgen} | R Documentation |
Run EPOS for Inference of Historical Population-Size Changes
Description
This function runs EPOS (based on Lynch et al. 2019) to estimate historical population-size
https://github.com/EvolBioInf/epos. It relies on a compiled version of the software
epos, epos2plot and if a bootstrap output is required bootSfs. For more information on the
approach check the publication (Lynch at al. 2019), the github repository
https://github.com/EvolBioInf/epos and look out for the manual epos.pdf
(https://github.com/EvolBioInf/epos/blob/master/doc/epos.pdf.
The binaries need to be provided in a single folder and can be downloaded via the
gl.download.binary
function (including the necessary dlls for windows; under Linux gls, blas need to be installed on your system). Please note: if you use this method, make sure you cite the original publication in your work.
EPOS (Estimation of Population Size changes) is a software tool developed based on the theoretical framework outlined by Lynch et al. (2019). It is designed to infer historical changes in population size using allele-frequency data obtained from population-genomic surveys. Below is a brief summary of the main concepts of EPOS:
EPOS (Estimation of Population Size changes) is a software tool that infers historical
changes in population size using allele-frequency data from population-genomic surveys.
The method relies on the site-frequency spectrum (SFS) of nearly neutral polymorphisms.
The underlying theory uses coalescence models, which describe how gene sequences have
originated from a common ancestor. By analyzing the probability distributions of the
starting and ending points of branch segments over all possible coalescence trees,
EPOS can estimate historic population sizes.
The function uses a model-flexible approach, meaning it estimates historic population
sizes, without the necessity to provide a candidate scenario. An efficient statistical
procedure is employed, to estimate historic effective population sizes.
For all the possible settings, please refer to the manual of EPOS.
The main parameters that are necessary to run the function are a genlight/dartR object,
L (length of sequences), u (mutation rate), and the path to the epos binaries.
For details check the example below.
Please note: There is currently not really a good way to estimate L, the length
of all sequences. Often users of dart data use the number of loci multiplied
by 69, but this is definitely an underestimate as monomorphic loci need to be
included (also the length of the restriction site should be added for each loci).
For mutation rate u, the default value is set to 5e-9, but should be adapted
to the species of interest. The good news is, that settings of L and mu affects
only the axis of the inferred history, but not the shape of the history.
So users can infer the shape, but need to be careful with a temporal interpretation
as both x and y axis are affected by the mutation rate and L.
Usage
gl.run.epos(
x,
epos.path,
sfs = NULL,
minbinsize = 1,
folded = TRUE,
L = NULL,
u = NULL,
boot = 0,
upper = 0.975,
lower = 0.025,
method = "greedy",
depth = 2,
other.options = "",
cleanup = TRUE,
plot.display = TRUE,
plot.theme = theme_dartR(),
plot.dir = NULL,
plot.file = NULL,
verbose = NULL
)
Arguments
x |
dartR/genlight object |
epos.path |
path to epos and other required programs (epos, epos2plot are always required and bootSfs in case a bootstrap and confidence estimate is required ) |
sfs |
if no sfs is provided function gl.sfs(x, minbinsize=1, singlepop=TRUE) is used to calculate the sfs that is provided to epos |
minbinsize |
remove bins from the left of the sfs. if you run epos from a genlight object the sfs is calculated by the function (using gl.sfs) and as default minbinsize is set to 1 (the monomorphic loci of the sfs are removed). This parameter is ignored if sfs is provide via the sfs parameter (see below). Be aware even if you genlight object has more than one population the sfs is calculated with singlepop set to true (one sfs for all individuals) as epos does not work with multidimensional sfs) |
folded |
if set to TRUE (default) a folded sfs (minor allele frequency sfs) is returned. If set to FALSE then an unfolded (derived allele frequency sfs) is returned. It is assumed that 0 is homozygote for the reference and 2 is homozygote for the derived allele. So you need to make sure your coding is correct. option -U in epos. |
L |
length of sequences (including monomorphic and polymorphic sites). If the sfs is provided with minbinsize=1 (default) then L needs to be specified. option -l in epos |
u |
mutation rate. If not provided the default value of epos is used (5e-9). option -u in epos |
boot |
if set to a value >0 the programm bootSfs is used to provide multiple bootstrapped sfs, which allows to calculate confidence intervals of the historic Ne sizes. Be aware the runtime can be extended. default:0 no bootstrapped simulations are run, otherwise boot number of bootstraps are run (option -i in bootSfs) |
upper |
upper quantile of the bootstrap (only used if boot>0). default 0.975. (option -u in epos2plot) |
lower |
lower quantile of the bootstrap (only used if boot>0). default 0.025. (option -l in epos2plot) |
method |
either "exhaustive" or "greedy". check the epos manual for details. If method="exhaustive" then the paramter depth is used. default: "greedy". |
depth |
if method="exhaustive" then this parameter is used to set the search depth, default is 2. If method is set to greedy this is setting is ignored. |
other.options |
additional options for epos (e.g -m, -x etc.) |
cleanup |
if set to true intermediate tempfiles are deleted after the run |
plot.display |
Specify if plot is to be produced [default TRUE]. |
plot.theme |
User specified theme [default theme_dartR()]. |
plot.dir |
Directory to save the plot RDS files [default as specified by the global working directory or tempdir()] |
plot.file |
Filename (minus extension) for the RDS plot file [Required for plot save] |
verbose |
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity]. |
Value
returns a list with two components:
history: Ne estimates of over generations (generation, median, low and high)
plot: a ggplot of history
Author(s)
Custodian: Bernd Gruber – Post to https://groups.google.com/d/forum/dartr
References
Lynch, Michael, Bernhard Haubold, Peter Pfaffelhuber, and Takahiro Maruki. 2019. Inference of Historical Population-Size Changes with Allele-Frequency Data. G3: Genes|Genomes|Genetics 10, no. 1: 211–23. doi:10.1534/g3.119.400854.
Examples
## Not run:
#gl.download.binary("epos",os="windows")
require(dartR.data)
epos <- gl.run.epos(possums.gl, epos.path = file.path(tempdir(),"epos"), L=1e5, u = 1e-8)
epos$history
## End(Not run)