R: Hypothesis test for quartet counts fitting a tree under the...

quartetTreeTest {MSCquartets}

R Documentation

Hypothesis test for quartet counts fitting a tree under the MSC

Description

Test the hypothesis H_0= T1 or T3 model of Mitchell et al. (2019), vs. H_1 = everything else. T1 is for a specific species quartet topology, and T3 for any species quartet topology.

Usage

quartetTreeTest(
  obs,
  model = "T3",
  lambda = 0,
  method = "MLest",
  smallcounts = "approximate",
  bootstraps = 10^4
)

Arguments

`obs`	vector of 3 counts of resolved quartet frequencies
`model`	`"T1"` or `"T3"`, for the models of Mitchell et al. (2019)
`lambda`	parameter for power-divergence statistic (e.g., 0 for likelihood ratio statistic, 1 for Chi-squared statistic)
`method`	`"MLtest"`,`"conservative"`, or `"bootstrap"`
`smallcounts`	`"bootstrap"` or `"approximate"`, method of obtaining p-value when some counts are small
`bootstraps`	number of samples for bootstrapping

Details

This function implements two of the versions of the test given by Mitchell et al. (2019) as well as parametric boostrapping, with other procedures for when some expected counts are small. When the topology and/or the internal quartet branch length is not specified by the null hypothesis these are more accurate tests than, say, a Chi-square with one degree of freedom, which is not theoretically justified near the singularities and boundaries of the models.

If method="MLtest", this uses the test by that name described in Section 7 of Mitchell et al. (2019). For both the T1 and T3 models the test is slightly anticonservative over a small range of true internal edges of the quartet species tree. Although the test generally performs well in practice, it lacks a uniform asymptotic guarantee over the full parameter space for either T1 or T3.

If method="conservative", a conservative test described by Mitchell et al. (2019) is used. For model T3 this uses the Chi-square distribution with 1 degree of freedom (the "least favorable" approach), while for model T1 it uses the Minimum Adjusted Bonferroni, based on precomputed values from simulations with n=1e+6. These conservative tests are asymptotically guaranteed to reject the null hypothesis at most at a specified level, but at the expense of increased type II errors.

If method="bootstrap", then parametric bootstrapping is performed, based on parameter estimates of the quartet topology and internal edge length. The bootstrap sample size is given by the bootstrap argument.

When some expected topology counts are small, the methods "MLest" and "conservative" are not appropriate. The argument smallcounts determines whether bootstrapping or a faster approximate method is used. These both involve estimates of the quartet topology and internal edge length. The approximate approach returns a precomputed p-value, found by replacing the largest observed count with 1e+6 and performing 1e+8 bootstraps for the model T3. When n is sufficiently large (at least 30) and some expected counts are small, the quartet tree error probability is small and the bootstrap p-value is approximately independent of the choice of T3 or T1 and of the largest observed count.

For model T1, the first entry of obs is treated as the count of gene quartets concordant with the species tree.

The returned p-value should be taken with caution when there is a small sample size, e.g. less than 30 gene trees. The returned value of bl is a consistent estimator, but not the MLE, of the internal edge length in coalescent units. Although consistent, the MLE for t is biased. Our consistent estimator is still biased, but with less bias than the MLE. See Mitchell et al. (2019) for more discussion on dealing with the bias of parameter estimates in the presence of boundaries and/or singularities of parameter spaces.

Value

output where output$p.value is the p-value and output$edgelength is a consistent estimator of the internal edge length in coalescent units, possibly Inf.

References

Mitchell J, Allman ES, Rhodes JA (2019). “Hypothesis testing near singularities and boundaries.” Electron. J. Statist., 13(1), 2150-2193. doi:10.1214/19-EJS1576.

Examples

 quartetTreeTest(c(17,72,11),"T3")
 quartetTreeTest(c(17,72,11),"T1")
 quartetTreeTest(c(72,11,17),"T1")
 quartetTreeTest(c(11,17,72),"T1")

[Package MSCquartets version 2.0 Index]