TINNIK {MSCquartets}R Documentation

TINNIK algorithm to infer species tree of blobs

Description

Apply the TINNIK algorithm of Allman et al. (2024) (see also Allman et al. (2022)) to infer a tree of blobs for the species network from a collection of gene trees, under the network multispecies coalescent (NMSC) model.

Usage

TINNIK(
  genedata,
  omit = FALSE,
  epsilon = 0,
  test = "T3",
  alpha = 0.05,
  beta = 0.95,
  treemethod = fastme.bal,
  delta = 0,
  taxanames = NULL,
  plot = TRUE
)

Arguments

genedata

gene tree data that may be supplied in any of 3 forms:

  1. as a character string giving the name of a file containing Newick gene trees,

  2. as a multiPhylo object containing the gene trees, or

  3. as a table of quartets on the gene trees, as produced by a previous call to TINNIK or quartetTableResolved, which has columns only for taxa, resolved quartet counts, and possibly p_T3, p_cut, and p_star

omit

FALSE to treat unresolved quartets as 1/3 of each resolution; TRUE to discard unresolved quartet data; ignored if gene tree data given as quartet table

epsilon

minimum for branch lengths to be treated as non-zero; ignored if gene tree data given as quartet table

test

a hypothesis test to perform, either "cut" or "T3" (default)

alpha

a value or vector of significance levels for judging p-values for test specified by "test"; testing a null hypothesis of no hybridization vs. an alternative of hybridization, for each quartet; a smaller value applies a less conservative test for a tree (more trees), hence a stricter requirement for deciding in favor of hybridization (fewer reticulations)

beta

a value or vector of significance levels for judging p-values testing a null hypothesis of a star tree (polytomy) for each quartet vs. an alternative of anything else; a smaller value applies a less conservative test for a star tree (more polytomies), hence a stricter requirement for deciding in favor of a resolved tree or network; if vectors, alpha and beta must have the same length

treemethod

a function implementing a method of tree inference from a distance table, e.g. the ape package's fastme.bal or nj

delta

a minimum edge length to retain in tree of blobs (see (Allman et al. 2024) for related theory); shorter edges are collapsed

taxanames

if genedata is a file or a multiPhylo object, a vector of a subset of the taxa names on the gene trees to be analyzed, if NULL all taxa on the first gene tree are used; if genedata is a quartet table, this argument is ignored and all taxa in the table are used

plot

TRUE produces simplex plots of hypothesis test results and plots the tree of blobs FALSE omits plots

Details

This function

  1. counts displayed quartets across gene trees to form quartet count concordance factors (qcCFs),

  2. applies appropriate hypothesis tests to judge qcCFs as representing putative hybridization, resolved trees, or unresolved (star) trees using alpha and beta as significance levels,

  3. produces a simplex plot showing results of the hypothesis tests for all qcCFs

  4. computes the appropriate TINNIK distance table, and infers the tree of blobs from the distance.

A call of TINNIK with genedata given as a table previously output from TINNIK is equivalent to a call of TINNIKdist followed by tree construction from the distance table. If genedata is a table previously output from quartetTableResolved which lacks columns of p-values for hypothesis tests, these will be appended to the table output by TINNIK. This table must contain a row with quartet counts for every 4 taxon set.

If plots are produced, there are 2 simplex plots: The first shows the hypothesis test results, and the second shows inferred B-quartets and T-quartets. In both, each point in the simplex plot corresponds to an empirical quartet concordance factor, color-coded to represent test or inference results.

In general, alpha should be chosen to be small and beta to be large so that most quartets are interpreted as resolved trees. More quartets judges to have either blob or unresolved relationships will lead to a less resolved blob tree.

Usually, an initial call to TINNIK will not give a good analysis, as values of alpha and beta are likely to need some adjustment based on inspecting the data. Saving the returned table of test results from TINNIK will allow for the results of the time-consuming computation of qcCFs to be saved, along with p-values, for input to further calls of TINNIK with new choices of alpha and beta.

See the documentation for TINNIKdist for an explanation of a small, rarely noticeable, stochastic element of the algorithm.

For data sets of many gene trees, user time may be reduced by using parallel code for counting displayed quartets. See quartetTableParallel.

Value

output (returned invisibly), with output$ToB the TINNIK tree of blobs, output$pTable the table of quartets and p-values for judging fit to the MSC on quartet trees, and output$Bquartets a TRUE/FALSE indicator vector of B-quartets; if alpha, beta are vectors, output$ToB is a vector of trees; the table can be used as input to TINNIK or TINNIKdist with new choices of alpha, beta, without re-tallying quartets on gene trees

References

Allman ES, Baños H, Mitchell JD, Rhodes JA (2022). “The tree of blobs of a species network: identifiability under the coalescent.” Journal of Mathematical Biology, 86(1), 10. doi:10.1007/s00285-022-01838-9.

Allman ES, Baños H, Mitchell JD, Rhodes JA (2024). “TINNIK: Inference of the Tree of Blobs of Species Networks Under the Coalescent.” draft.

See Also

quartetTable, quartetTableParallel, quartetTableDominant, quartetCutTestInd,quartetTreeTestInd, quartetStarTestInd, TINNIKdist, quartetTestPlot, pvalHist

Examples

data(pTableYeastRokas)
out=TINNIK(pTableYeastRokas,test="T3",alpha=.01, beta=.05)


[Package MSCquartets version 2.0 Index]