quartetTreeTest {MSCquartets} | R Documentation |
Hypothesis test for quartet counts fitting a tree under the MSC
Description
Test the hypothesis H_0= T1 or T3 model of Mitchell et al. (2019), vs. H_1 = everything else. T1 is for a specific species quartet topology, and T3 for any species quartet topology.
Usage
quartetTreeTest(
obs,
model = "T3",
lambda = 0,
method = "MLest",
smallcounts = "approximate",
bootstraps = 10^4
)
Arguments
obs |
vector of 3 counts of resolved quartet frequencies |
model |
|
lambda |
parameter for power-divergence statistic (e.g., 0 for likelihood ratio statistic, 1 for Chi-squared statistic) |
method |
|
smallcounts |
|
bootstraps |
number of samples for bootstrapping |
Details
This function implements two of the versions of the test given by Mitchell et al. (2019) as well as parametric boostrapping, with other procedures for when some expected counts are small. When the topology and/or the internal quartet branch length is not specified by the null hypothesis these are more accurate tests than, say, a Chi-square with one degree of freedom, which is not theoretically justified near the singularities and boundaries of the models.
If method="MLtest"
, this uses the test by that name described in Section 7 of Mitchell et al. (2019).
For both the T1 and T3 models the test is slightly anticonservative over a small range of true internal edges of the quartet species tree.
Although the test generally performs well in practice, it lacks a uniform asymptotic guarantee over
the full parameter space for either T1 or T3.
If method="conservative"
, a conservative test described by Mitchell et al. (2019) is used. For model T3 this
uses the Chi-square distribution with 1 degree of freedom
(the "least favorable" approach), while for model T1
it uses the Minimum Adjusted Bonferroni, based on precomputed values from simulations with n=1e+6.
These conservative tests are asymptotically guaranteed to reject the null
hypothesis at most at a specified level, but at the expense of increased type II errors.
If method="bootstrap"
, then parametric bootstrapping is performed, based on parameter estimates of the quartet topology
and internal edge length. The bootstrap sample size is given by the bootstrap
argument.
When some expected topology counts are small, the methods "MLest"
and "conservative"
are not appropriate.
The argument smallcounts
determines whether bootstrapping or a faster approximate method is used.
These both involve estimates of the quartet topology and internal edge length. The approximate approach
returns a precomputed p-value, found by replacing the largest observed count
with 1e+6 and performing 1e+8 bootstraps for the model T3. When n is sufficiently large (at least 30) and
some expected counts are small, the quartet tree error probability is small and the bootstrap p-value is
approximately independent of the choice of T3 or T1 and of the largest observed count.
For model T1, the first entry of obs
is treated as the count of gene quartets concordant with the species tree.
The returned p-value should be taken with caution when there is a small sample size, e.g. less than 30 gene trees.
The returned value of bl
is a consistent estimator, but not the MLE, of the internal
edge length in coalescent units. Although consistent, the MLE for t is biased.
Our consistent estimator is still biased, but with less bias than the MLE. See Mitchell et al. (2019)
for more discussion on dealing with the bias of parameter estimates in the
presence of boundaries and/or singularities of parameter spaces.
Value
output
where output$p.value
is the p-value and output$edgelength
is a consistent estimator of the
internal edge length in coalescent units, possibly Inf
.
References
Mitchell J, Allman ES, Rhodes JA (2019). “Hypothesis testing near singularities and boundaries.” Electron. J. Statist., 13(1), 2150-2193. doi:10.1214/19-EJS1576.
See Also
Examples
quartetTreeTest(c(17,72,11),"T3")
quartetTreeTest(c(17,72,11),"T1")
quartetTreeTest(c(72,11,17),"T1")
quartetTreeTest(c(11,17,72),"T1")