treeContradiction {paleotree} | R Documentation |
Measure the Contradiction Difference Between Two Phylogenetic Topologies
Description
An alternative measure of pair-wise dissimilarity between two tree topologies which ignores differences in phylogenetic resolution between the two, unlike typical metrics (such as Robinson-Foulds distance). The metric essentially counts up the number of splits on both trees that are directly contradicted by a split on the contrasting topology (treating both as unrooted). By default, this 'contradiction difference' value is then scaled to between 0 and 1, by dividing by the total number of splits that could have been contradicted across both trees ( 2 * (Number of shared tips - 2) ). On this scaled, 0 represents no conflicting relationships and 1 reflects two entirely conflicting topologies, similar to the rescaling in Colless's consensus fork index.
Usage
treeContradiction(tree1, tree2, rescale = TRUE)
Arguments
tree1 , tree2 |
Two phylogenies, with the same number of tips and
an identical set of tip labels, both of class |
rescale |
A logical. If |
Details
Algorithmically, conflicting splits are identified by counting the number of splits
(via ape
's prop.part
) on one tree that disagree with at least one split
on the other tree: for example, split (AB)CD would be contradicted by split (AC)BD. To
put it another way, all we need to test for is whether the taxa segregated by that split
were found to be more closely related to some other taxa, not so segregated by the
considered split.
This metric was designed mainly for use with trees that differ in their resolution, particularly when it is necessary to compare between summary trees (such as consensus trees of half-compatibility summaries) from separate phylogenetic analyses. Note that comparing summary trees can be problematic in some instances, and users should carefully consider their question of interest, and whether it may be more ideal to consider whole samples of trees (e.g., the posterior sample, or the sample of most parsimonious trees).
The contradiction difference is not a metric distance: most notably, the triangle inequality is not held and thus the 'space' it describes between topologies is not a metric space. This can be shown most simply when considering any two different but fully-resolve topologies and a third topology that is a star tree. The star tree will have a zero pair-wise CD with either fully-resolved phylogeny, but there will be a positive CD between the fully-resolved trees. An example of this is shown in the examples below.
The CD also suggest very large differences when small numbers of taxa shift greatly across the tree, a property shared by many other typical tree comparisons, such as RF distances. See examples below.
Value
The contradiction difference between two trees is reported as a single numeric variable.
Author(s)
David W. Bapst. This code was produced as part of a project funded by National Science Foundation grant EAR-1147537 to S. J. Carlson.
References
This contradiction difference measure was introduced in:
Bapst, D. W., H. A. Schreiber, and S. J. Carlson. 2018. Combined Analysis of Extant Rhynchonellida (Brachiopoda) using Morphological and Molecular Data. Systematic Biology 67(1):32-48.
See Also
See phangorn
's function for calculating the Robinson-Foulds distance: treedist
.
Graeme Lloyd's metatree
package, currently not on CRAN,
also contains the function MultiTreeDistance
for calculating both the contradiction difference measure and the Robinson-Foulds distance.
This function is optimized for very large samples of trees or very large
trees, and thus may be faster than treeContradiction
.
Also see the function MultiTreeContradiction
in the same package.
Examples
# let's simulate two trees
set.seed(1)
treeA <- rtree(30,br = NULL)
treeB <- rtree(30,br = NULL)
## Not run:
# visualize the difference between these two trees
library(phytools)
plot(cophylo(treeA,treeB))
# what is the Robinson-Foulds (RF) distance between these trees?
library(phangorn)
treedist(treeA,treeB)
## End(Not run)
# The RF distance is less intuitive when
# we consider a tree that isn't well-resolved
# let's simulate the worst resolved tree possible: a star tree
treeC <- stree(30)
## Not run:
# plot the tanglegram between A and C
plot(cophylo(treeA,treeC))
# however the RF distance is *not* zero
# even though the only difference is a difference in resolution
treedist(treeA,treeC)
## End(Not run)
# the contradiction difference (CD) ignores differences in resolution
# Tree C (the star tree) has zero CD between it and trees A and B
identical(treeContradiction(treeA,treeC),0) # should be zero distance
identical(treeContradiction(treeB,treeC),0) # should be zero distance
# two identical trees also have zero CD between them (as you'd hope)
identical(treeContradiction(treeA,treeA),0) # should be zero distance
#' and here's the CD between A and B
treeContradiction(treeA,treeB) # should be non-zero distance
# a less ideal property of the CD is that two taxon on opposite ends of the
# moving from side of the topology to the other of an otherwise identical tree
# will return the maximum contradiction difference possible (i.e., ` = 1`)
# an example
treeAA <- read.tree(text = "(A,(B,(C,(D,(E,F)))));")
treeBB <- read.tree(text = "(E,(B,(C,(D,(A,F)))));")
## Not run:
plot(cophylo(treeAA,treeBB))
## End(Not run)
treeContradiction(treeAA,treeBB)
## Not run:
# Note however also a property of RF distance too:
treedist(treeAA,treeBB)
## End(Not run)