treeContradiction {paleotree}R Documentation

Measure the Contradiction Difference Between Two Phylogenetic Topologies

Description

An alternative measure of pair-wise dissimilarity between two tree topologies which ignores differences in phylogenetic resolution between the two, unlike typical metrics (such as Robinson-Foulds distance). The metric essentially counts up the number of splits on both trees that are directly contradicted by a split on the contrasting topology (treating both as unrooted). By default, this 'contradiction difference' value is then scaled to between 0 and 1, by dividing by the total number of splits that could have been contradicted across both trees ( 2 * (Number of shared tips - 2) ). On this scaled, 0 represents no conflicting relationships and 1 reflects two entirely conflicting topologies, similar to the rescaling in Colless's consensus fork index.

Usage

treeContradiction(tree1, tree2, rescale = TRUE)

Arguments

tree1, tree2

Two phylogenies, with the same number of tips and an identical set of tip labels, both of class phylo.

rescale

A logical. If FALSE, the raw number of contradicted splits across both trees is reported. If TRUE (the default), the contradiction difference value is returned rescaled to the total number of splits across both input trees that could have contradicted.

Details

Algorithmically, conflicting splits are identified by counting the number of splits (via ape's prop.part) on one tree that disagree with at least one split on the other tree: for example, split (AB)CD would be contradicted by split (AC)BD. To put it another way, all we need to test for is whether the taxa segregated by that split were found to be more closely related to some other taxa, not so segregated by the considered split.

This metric was designed mainly for use with trees that differ in their resolution, particularly when it is necessary to compare between summary trees (such as consensus trees of half-compatibility summaries) from separate phylogenetic analyses. Note that comparing summary trees can be problematic in some instances, and users should carefully consider their question of interest, and whether it may be more ideal to consider whole samples of trees (e.g., the posterior sample, or the sample of most parsimonious trees).

The contradiction difference is not a metric distance: most notably, the triangle inequality is not held and thus the 'space' it describes between topologies is not a metric space. This can be shown most simply when considering any two different but fully-resolve topologies and a third topology that is a star tree. The star tree will have a zero pair-wise CD with either fully-resolved phylogeny, but there will be a positive CD between the fully-resolved trees. An example of this is shown in the examples below.

The CD also suggest very large differences when small numbers of taxa shift greatly across the tree, a property shared by many other typical tree comparisons, such as RF distances. See examples below.

Value

The contradiction difference between two trees is reported as a single numeric variable.

Author(s)

David W. Bapst. This code was produced as part of a project funded by National Science Foundation grant EAR-1147537 to S. J. Carlson.

References

This contradiction difference measure was introduced in:

Bapst, D. W., H. A. Schreiber, and S. J. Carlson. 2018. Combined Analysis of Extant Rhynchonellida (Brachiopoda) using Morphological and Molecular Data. Systematic Biology 67(1):32-48.

See Also

See phangorn's function for calculating the Robinson-Foulds distance: treedist.

Graeme Lloyd's metatree package, currently not on CRAN, also contains the function MultiTreeDistance for calculating both the contradiction difference measure and the Robinson-Foulds distance. This function is optimized for very large samples of trees or very large trees, and thus may be faster than treeContradiction. Also see the function MultiTreeContradiction in the same package.

Examples


# let's simulate two trees

set.seed(1)
treeA <- rtree(30,br = NULL)
treeB <- rtree(30,br = NULL)

## Not run: 

# visualize the difference between these two trees
library(phytools)
plot(cophylo(treeA,treeB))

# what is the Robinson-Foulds (RF) distance between these trees?
library(phangorn)
treedist(treeA,treeB)


## End(Not run)

# The RF distance is less intuitive when 
    # we consider a tree that isn't well-resolved

# let's simulate the worst resolved tree possible: a star tree
treeC <- stree(30)

## Not run: 
# plot the tanglegram between A and C
plot(cophylo(treeA,treeC))

# however the RF distance is *not* zero
# even though the only difference is a difference in resolution
treedist(treeA,treeC)

## End(Not run)

# the contradiction difference (CD) ignores differences in resolution

# Tree C (the star tree) has zero CD between it and trees A and B
identical(treeContradiction(treeA,treeC),0)  # should be zero distance
identical(treeContradiction(treeB,treeC),0)  # should be zero distance

# two identical trees also have zero CD between them (as you'd hope) 
identical(treeContradiction(treeA,treeA),0)  # should be zero distance

#' and here's the CD between A and B
treeContradiction(treeA,treeB)  # should be non-zero distance

# a less ideal property of the CD is that two taxon on opposite ends of the 
# moving from side of the topology to the other of an otherwise identical tree
# will return the maximum contradiction difference possible (i.e., ` =  1`)

# an example
treeAA <- read.tree(text = "(A,(B,(C,(D,(E,F)))));")
treeBB <- read.tree(text = "(E,(B,(C,(D,(A,F)))));")

## Not run: 
plot(cophylo(treeAA,treeBB))

## End(Not run)

treeContradiction(treeAA,treeBB)

## Not run: 
# Note however also a property of RF distance too:
treedist(treeAA,treeBB)

## End(Not run)


[Package paleotree version 3.4.7 Index]