fit_tree {bnpsd} | R Documentation |
Fit a tree structure to a coancestry matrix
Description
Implements a heuristic algorithm to find the optimal tree topology based on joining pairs of subpopulations with the highest between-coancestry values, and averaging parent coancestries for the merged nodes (a variant of WPGMA hierarchical clustering stats::hclust()
).
Branch lengths are optimized using the non-negative least squares approach nnls::nnls()
, which minimize the residual square error between the given coancestry matrix and the model coancestry implied by the tree.
This algorithm recovers the true tree when the input coancestry truly belongs to this tree and there is no estimation error (i.e. it is the inverse function of coanc_tree()
), and performs well for coancestry estimates (i.e. the result of simulating genotypes from the true tree, i.e. from draw_all_admix()
, and estimating the coancestry using popkin::popkin()
followed by popkin::inbr_diag()
).
Usage
fit_tree(coancestry, method = "mcquitty")
Arguments
coancestry |
The coancestry matrix to fit a tree to. |
method |
The hierarchical clustering method (passed to |
Details
The tree is bifurcating by construction, but edges may equal zero if needed, effectively resulting in multifurcation, although this code makes no attempt to merge nodes with zero edges. For best fit to arbitrary data, the root edge is always fit to the data (may also be zero). Data fit may be poor if the coancestry does not correspond to a tree, particularly if there is admixture between subpopulations.
Value
A phylo
object from the ape
package (see ape::read.tree()
), with two additional list elements at the end:
-
edge
: (standardphylo
.) A matrix describing the tree topology, listing parent and child node indexes. -
Nnode
: (standardphylo
.) Number of internal (non-leaf) nodes. -
tip.label
: (standardphylo
.) Labels for tips (leaf nodes), in order of index as inedge
matrix above. These match the row names of the inputcoancestry
matrix, or if names are missing, the row indexes of this matrix are used (in both cases, labels may be reordered compared torownames( coancestry )
). Tips are ordered as they appear in the aboveedge
matrix (ensuring visual agreement in plots between the tree and its resulting coancestry matrix viacoanc_tree()
), and in an order that matches the inputcoancestry
's subpopulation order as much as possible (tree constraints do not permit arbitrary tip orders, but a heuristic implemented intree_reorder()
is used to determine a reasonable order when an exact match is not possible). -
edge.length
: (standardphylo
.) Values of edge lengths in same order as rows ofedge
matrix above. -
root.edge
: (standardphylo
.) Value of root edge length. -
rss
: The residual sum of squares of the model coancestry versus the input coancestry. -
edge.length.add
: Additive edge values (regarding their effect on coancestry, as opposed toedge.length
which are probabilities, seecoanc_tree()
)
See Also
coanc_tree()
) for the inverse function (when the coancestry corresponds exactly to a tree).
tree_reorder()
for reordering tree structure so that tips appear in a particular desired order.
Examples
# create a random tree
library(ape)
k <- 5
tree <- rtree( k )
# true coancestry matrix for this tree
coanc <- coanc_tree( tree )
# now recover tree from this coancestry matrix:
tree2 <- fit_tree( coanc )
# tree2 equals tree!