R: Apply NANUQ network inference algorithm to gene tree data

NANUQ {MSCquartets}

R Documentation

Apply NANUQ network inference algorithm to gene tree data

Description

Apply the NANUQ algorithm of Allman et al. (2019) to infer a hybridization network from a collection of gene trees, under the level-1 network multispecies coalescent (NMSC) model.

Usage

NANUQ(
  genedata,
  outfile = "NANUQdist",
  omit = FALSE,
  epsilon = 0,
  alpha = 0.05,
  beta = 0.95,
  taxanames = NULL,
  plot = TRUE
)

Arguments

`genedata`	gene tree data that may be supplied in any of 3 forms: as a character string giving the name of a file containing Newick gene trees, as a multiPhylo object containing the gene trees, or as a table of quartets on the gene trees, as produced by a previous call to `NANUQ` or `quartetTableResolved`, which has columns only for taxa, resolved quartet counts, and possibly p_T3 and p_star
`outfile`	a character string giving an output file name stub for saving a `NANUQ` distance matrix in nexus format; to the stub `outfile` will be appended an `alpha` and `beta` value and ".nex"; if `NULL` then then no file is written
`omit`	`FALSE` to treat unresolved quartets as 1/3 of each resolution; `TRUE` to discard unresolved quartet data; ignored if gene tree data given as quartet table
`epsilon`	minimum for branch lengths to be treated as non-zero; ignored if gene tree data given as quartet table
`alpha`	a value or vector of significance levels for judging p-values testing a null hypothesis of no hybridization vs. an alternative of hybridization, for each quartet; a smaller value applies a less conservative test for a tree (more trees), hence a stricter requirement for desciding in favor of hybridization (fewer reticulations)
`beta`	a value or vector of significance levels for judging p-values testing a null hypothesis of a star tree (polytomy) for each quartet vs. an alternative of anything else; a smaller value applies a less conservative test for a star tree (more polytomies), hence a stricter requirement for deciding in favor of a resolved tree or network; if vectors, `alpha` and `beta` must have the same length
`taxanames`	if `genedata` is a file or a multiPhylo object, a vector of a subset of the taxa names on the gene trees to be analyzed, if `NULL` all taxa on the first gene tree are used; if `genedata` is a quartet table, this argument is ignored and all taxa in the table are used
`plot`	`TRUE` produces simplex plots of hypothesis test results, `FALSE` omits plots

Details

This function

counts displayed quartets across gene trees to form quartet count concordance factors (qcCFs),
applies appropriate hypothesis tests to judge qcCFs as representing putative hybridization, resolved trees, or unresolved (star) trees using alpha and beta as significance levels,
produces a simplex plot showing results of the hypothesis tests for all qcCFs
computes the appropriate NANUQ distance table, writing it to a file.

The distance table file can then be opened in the external software SplitsTree (Huson and Bryant 2006) (recommended) or within R using the package phangorn to obtain a circular split system under the Neighbor-Net algorithm, which is then depicted as a splits graph. The splits graph should be interpreted via the theory of Allman et al. (2019) to infer the level-1 species network, or to conclude the data does not arise from the NMSC on such a network.

If alpha and beta are vectors, they must have the same length k. Then the i-th entries are paired to produce k plots and k output files. This is equivalent to k calls to NANUQ with scalar values of alpha and beta.

A call of NANUQ with genedata given as a table previously output from NANUQ is equivalent to a call of NANUQdist. If genedata is a table previously output from quartetTableResolved which lacks columns of p-values for hypothesis tests, these will be appended to the table output by NANUQ.

If plots are produced, each point represents an empirical quartet concordance factor, color-coded to represent test results.

In general, alpha should be chosen to be small and beta to be large so that most quartets are interpreted as resolved trees.

Usually, an initial call to NANUQ will not give a good analysis, as values of alpha and beta are likely to need some adjustment based on inspecting the data. Saving the returned table from NANUQ will allow for the results of the time-consuming computation of qcCFs to be saved, along with p-values, for input to further calls of NANUQ with new choices of alpha and beta.

See the documentation for quartetNetworkDist for an explanation of a small, rarely noticeable, stochastic element of the algorithm.

For data sets of many gene trees, user time may be reduced by using parallel code for counting displayed quartets. See quartetTableParallel, where example commands are given.

Value

a table $pTable of quartets and p-values for judging fit to the MSC on quartet trees, and a distance table $dist, or list of distance tables, giving NANUQ distance (returned invisibly); the table can be used as input to NANUQ or NANUQdist with new choices of alpha and beta, without re-tallying quartets on gene trees; the distance table is to be used as input to NeighborNet.

References

Allman ES, Baños H, Rhodes JA (2019). “NANUQ: A method for inferring species networks from gene trees under the coalescent model.” Algorithms Mol. Biol., 14(24), 1-25. doi:10.1186/s13015-019-0159-2.

Huson DH, Bryant D (2006). “Application of Phylogenetic Networks in Evolutionary Studies.” Molecular Biology and Evolution, 23(2), 254-267.

Examples

data(pTableYeastRokas)
out=NANUQ(pTableYeastRokas, alpha=.05, beta=.95, outfile = NULL)
# Specifying an outfile would write the distance table to it for opening in SplitsTree.
# Alternately, to use the phangorn implementation of NeighborNet
# within R, enter the following additional lines:
nn=neighborNet(out$dist)
plot(nn,"2D")

[Package MSCquartets version 2.0 Index]