quartetCutTest {MSCquartets}R Documentation

Hypothesis test for quartet counts fitting a resolved quartet tree of blobs under NMSC

Description

Test the hypothesis H_0=Cut model of Allman et al. (2024), Section 3., vs. H_1= everything else. Returns p-value and estimate of tree of blobs topology.

Usage

quartetCutTest(
  obs,
  lambda = 0,
  method = "MLest",
  smallcounts = "approximate",
  bootstraps = 10^4
)

Arguments

obs

vector of 3 counts of resolved quartet frequencies

lambda

parameter for power-divergence statistic (e.g., 0 for likelihood ratio statistic, 1 for Chi-squared statistic)

method

"MLtest","conservative", or "bootstrap"

smallcounts

"bootstrap" or "approximate", method of obtaining p-value when some counts are small

bootstraps

number of samples for bootstrapping

Details

The Cut model for quartet CFs is the NMSC combined with the quartet species network having a cut edge separating two of the taxa from the other two.

This function implements the test described in Allman et al. (2024) as well as parametric bootstrapping, with other procedures for when some expected counts are small. These are more accurate tests than, say, a Chi-square with one degree of freedom, which is not theoretically justified near the singularity of the model, nor for small counts.

If method="MLtest", this uses the test for the Cut model described in Section 3 of Allman et al. (2024), using the ML estimate of the generating parameter. As shown in simulations in that paper, the test is conservative when small critical values are used for rejection. Although the test generally performs well in practice, it lacks a uniform asymptotic guarantee over the full parameter space.

If method="conservative", the test uses the Chi-square distribution with 1 degree of freedom (the "least favorable" approach). This is asymptotically guaranteed to reject the null hypothesis at most at a specified level, but at the expense of increased type II errors.

If method="bootstrap", then parametric bootstrapping is performed, based on ML estimates of the CF. The bootstrap sample size is given by the bootstrap argument.

When some expected topology counts are small, the methods "MLest" and "conservative" are not appropriate. The argument smallcounts determines whether bootstrapping or a faster approximate method is used. These use ML estimates of the CF under the Cut model.

If two of the three counts are small (so the estimated CF is near a vertex of the simplex), The approximate approach returns a precomputed p-value, found by replacing the largest observed count with 1e+6 and performing 1e+8 bootstraps. When n is sufficiently large (at least 30) and some expected counts are small, the probability of topological error is small and the bootstrap p-value is approximately independent of the largest observed count.

If one of the three counts is small (so the estimated CF is near an edge of the simplex), a chi-squared test using the binomial model for the larger counts is used, as described by Allman et al. (2024).

The returned p-value should be taken with caution when there is a small sample size, e.g. less than 30 gene trees.

Value

output where output$p.value is a p-value and output$topology = 1, 2, or 3 indicates the ML estimate of the topology of the quartet tree of blobs in accord with ordering of qcCF entries.

References

Allman ES, Baños H, Mitchell JD, Rhodes JA (2022). “The tree of blobs of a species network: identifiability under the coalescent.” Journal of Mathematical Biology, 86(1), 10. doi:10.1007/s00285-022-01838-9.

Allman ES, Baños H, Mitchell JD, Rhodes JA (2024). “TINNIK: Inference of the Tree of Blobs of Species Networks Under the Coalescent.” draft.

Mitchell J, Allman ES, Rhodes JA (2019). “Hypothesis testing near singularities and boundaries.” Electron. J. Statist., 13(1), 2150-2193. doi:10.1214/19-EJS1576.

See Also

quartetCutTestInd

Examples

 quartetCutTest(c(17,72,11))
 quartetCutTest(c(48,11,41))
 quartetCutTest(c(11,48,41))
 quartetCutTest(c(48,41,11))


[Package MSCquartets version 2.0 Index]