R: Fit a tree structure to a coancestry matrix

fit_tree {bnpsd}

R Documentation

Fit a tree structure to a coancestry matrix

Description

Implements a heuristic algorithm to find the optimal tree topology based on joining pairs of subpopulations with the highest between-coancestry values, and averaging parent coancestries for the merged nodes (a variant of WPGMA hierarchical clustering stats::hclust()). Branch lengths are optimized using the non-negative least squares approach nnls::nnls(), which minimize the residual square error between the given coancestry matrix and the model coancestry implied by the tree. This algorithm recovers the true tree when the input coancestry truly belongs to this tree and there is no estimation error (i.e. it is the inverse function of coanc_tree()), and performs well for coancestry estimates (i.e. the result of simulating genotypes from the true tree, i.e. from draw_all_admix(), and estimating the coancestry using popkin::popkin() followed by popkin::inbr_diag()).

Usage

fit_tree(coancestry, method = "mcquitty")

Arguments

`coancestry`	The coancestry matrix to fit a tree to.
`method`	The hierarchical clustering method (passed to `stats::hclust()`). While all `stats::hclust()` methods are supported, only two really make sense for this application: "mcquitty" (i.e. WPGMA, default) and "average" (UPGMA).

Details

The tree is bifurcating by construction, but edges may equal zero if needed, effectively resulting in multifurcation, although this code makes no attempt to merge nodes with zero edges. For best fit to arbitrary data, the root edge is always fit to the data (may also be zero). Data fit may be poor if the coancestry does not correspond to a tree, particularly if there is admixture between subpopulations.

Value

A phylo object from the ape package (see ape::read.tree()), with two additional list elements at the end:

edge: (standard phylo.) A matrix describing the tree topology, listing parent and child node indexes.
Nnode: (standard phylo.) Number of internal (non-leaf) nodes.
tip.label: (standard phylo.) Labels for tips (leaf nodes), in order of index as in edge matrix above. These match the row names of the input coancestry matrix, or if names are missing, the row indexes of this matrix are used (in both cases, labels may be reordered compared to rownames( coancestry )). Tips are ordered as they appear in the above edge matrix (ensuring visual agreement in plots between the tree and its resulting coancestry matrix via coanc_tree()), and in an order that matches the input coancestry's subpopulation order as much as possible (tree constraints do not permit arbitrary tip orders, but a heuristic implemented in tree_reorder() is used to determine a reasonable order when an exact match is not possible).
edge.length: (standard phylo.) Values of edge lengths in same order as rows of edge matrix above.
root.edge: (standard phylo.) Value of root edge length.
rss: The residual sum of squares of the model coancestry versus the input coancestry.
edge.length.add: Additive edge values (regarding their effect on coancestry, as opposed to edge.length which are probabilities, see coanc_tree())

Examples

# create a random tree
library(ape)
k <- 5
tree <- rtree( k )
# true coancestry matrix for this tree
coanc <- coanc_tree( tree )

# now recover tree from this coancestry matrix:
tree2 <- fit_tree( coanc )
# tree2 equals tree!

[Package bnpsd version 1.3.13 Index]