fit.graph {poolfstat} | R Documentation |
Estimate parameters of an admixture graph
Description
Estimate parameters of an admixture graph
Usage
fit.graph(
graph.params,
Q.lambda = 0,
eps.admix.prop = 1e-06,
edge.fact = 1000,
admix.fact = 100,
compute.ci = F,
drift.scaling = F,
outfileprefix = NULL,
verbose = TRUE
)
Arguments
graph.params |
An object of class graph.params containing graph information and relevant Fstats estimates (see the function generate.graph.params) |
Q.lambda |
A scalar (usually small) to add to the diagonal elements of the error covariance matrix of fstats estimates (may improve numerical stability of its decomposition for large number of populations) |
eps.admix.prop |
A scalar defining admixture proportion domain (eps.admix.prop vary between eps.admix.prop and 1-eps.admix.prop) |
edge.fact |
The multiplying factor of edges length in graph representation |
admix.fact |
The multiplying factor of admixture proportion in graph representation |
compute.ci |
Derive 95% Confidence Intervals for the parameters of the admixture graph (edge lengths and admixture rates) |
drift.scaling |
If TRUE scale edge lengths in drift units (require estimates of leave heterozygosities) |
outfileprefix |
The prefix of the dot file that will represent the graph (with extension ".dot"). If NULL, no graph file generated |
verbose |
If TRUE extra information is printed on the terminal |
Details
Let represent the n-length vector of basis target (i.e., observed) F2 and F3 statistics and
the vector of their expected values given the vector of graph edges lengths
and the incidence matrix
that depends on the structure of the graph and the admixture rates
(if there is no admixture in the graph,
only contains 0 or 1). The function attempts to find the
and
graph parameter values that minimize a cost (score of the model) defined as
. Assuming
(i.e., the observed f-statistics vector is multivariate normal distributed around an expected g vector specified by the admixture graph and a covariance structure empirically estimated),
where
is the likelihood of the fitted graph and
. Also, for model comparison purpose, a standard
is then derived from
as
(p being the number of graph parameters, i.e., edge lengths and admixture rates).
As mentioned by Patterson et al. (2012), the score
is quadratic in edge lengths
given
. The function uses the Lawson-Hanson non-negative linear least squares algorithm implemented in the nnls function (package nnls) to estimate
(subject to the constraint of positive edge lengths) by finding the vector
that minimize
(where
results from the Cholesky decomposition of
, i.e.,
). Note that the *Q.lambda* argument may be used to add a small constant (e.g.,
) to the diagonal elements of
to avoid numerical problems (see Patterson et al., 2012). Yet *Q.lambda* is always disregarded when computing the final score
and
. Minimization of
is thus reduced to the identification of the admixture rates (
vector) which is performed using the L-BFGS-B method (i.e., Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm with box constraints) implemented in the optim function (stats package). The *eps.admix.prop* argument allows specifying the lower and upper bound of the admixture rates to *eps.admix.prop* and *1-eps.admix.prop* respectively.
Scaling of the edges lengths in drift units (i.e., in units of
where
is time in generations and
is the effective population size) is performed as described in Lipson et al. (MBE, 2013) by dividing the estimated edges lengths by half the estimated heterozygosity of their parental nodes (using the property
where
and
are the heterozygosities of a child C and its parent P node and
is the estimated length of the branch relating C and P.
Finally, if compute.ci=TRUE, a (rough)
confidence intervals is computed using a bisection method (with a
precision) for each parameters in turn (all others being set to their estimated value). Note that
CI are here defined as the set of values associated to a score
such that
(where
is the optimized score), i.e., with a likelihood-ratio test statistic with respect to the fitted values
(the
threshold of a one ddl Chi-square distribution).
Value
An object of class fitted.graph (see help(fitted.graph) for details)
See Also
To generate a graph.params object, see generate.graph.params
. The fitted graph may be plotted directly using plot that calls grViz() function and the resulting fitted fstats may be compared to the estimated ones with compare.fitted.fstats
.