model_adequacy_hbd {castor}  R Documentation 
Given a rooted ultrametric timetree and a homogenous birthdeath model, check if the model adequately explains various aspects of the tree, such as the branch length and node age distributions and other test statistics. The function uses bootstrapping to simulate multiple hypothetical trees according to the model and then compares the distribution of those trees to the original tree. This function may be used to quantify the "goodness of fit" of a birthdeath model to a timetree.
model_adequacy_hbd( tree, models, splines_degree = 1, extrapolate = FALSE, Nbootstraps = 1000, Nthreads = 1)
tree 
A rooted ultrametric timetree of class "phylo". 
models 
Either a single HBD model or a list of HBD models, specifying the pool of models from which to randomly draw bootstraps. Every model should itself be a named list with some or all of the following elements:

splines_degree 
Integer, one of 0, 1, 2 or 3, specifying the polynomial degree of the PSR between agegrid points. For example, 
extrapolate 
Logical, specifying whether to extrapolate the model variables λ, μ, ψ and κ (as constants) beyond the provided age grid all the way to 
Nbootstraps 
Integer, the number of bootstraps (simulations) to perform for calculating statistical significances. A larger number will increase the accuracy of estimated statistical significances. 
Nthreads 
Integer, number of parallel threads to use for bootstrapping. Note that on Windows machines this option is ignored. 
In addition to model selection, the adequacy of any chosen model should also be assessed in absolute terms, i.e. not just relative to other competing models (after all, all considered models might be bad). This function essentially determines how probable it is for hypothetical trees generated by a candidate model to resemble the tree at hand, in terms of various test statistics (such as the historically popular "gamma" statistic, or the Colless tree imbalance). In particular, the function uses a KolmogorovSmirnov test to check whether the probability distributions of edge lengths and node ages in the tree resemble those expected under the model. All statistical significances are calculated using bootstrapping, i.e. by simulating trees from the provided model with the same number of tips and the same root age as the original tree.
Note that even if an HBD model appears to adequately explain a given timetree, this does not mean that the model even approximately resembles the true diversification history (i.e., the true speciation and extinction rates) that generated the tree (Louca and Pennell 2020). Hence, it is generally more appropriate to say that a given model "congruence class" (or PSR) rather than a specific model (or speciation/extinction rate) explains the tree.
A named list with the following elements:
success 
Logical, indicating whether the model evaluation was successful. If 
Nbootstraps 
Integer, the number of bootstraps used. 
tree_gamma 
Numeric, gamma statistic (Pybus and Harvey 2000) of the original tree. 
bootstrap_mean_gamma 
Numeric, mean gamma statistic across all bootstrap trees. 
Pgamma 
Numeric, twosided statistical significance of the tree's gamma statistic under the provided null model, i.e. the probability that 
tree_Colless 
Numeric, Colless imbalance statistic (Shao and Sokal, 1990) of the original tree. 
bootstrap_mean_Colless 
Numeric, mean Colless statistic across all bootstrap trees. 
PColless 
Numeric, twosided statistical significance of the tree's Colless statistic under the provided null model, i.e. the probability that 
tree_Sackin 
Numeric, Sackin statistic (Sackin, 1972) of the original tree. 
bootstrap_mean_Sackin 
Numeric, median Sackin statistic across all bootstrap trees. 
PSackin 
Numeric, twosided statistical significance of the tree's Sackin statistic under the provided null model, i.e. the probability that 
tree_edgeKS 
Numeric, KolmogorovSmirnov (KS) statistic of the original tree's edge lengths, i.e. the estimated maximum difference between the tree's and the model's (estimated) cumulative distribution function of edge lengths. 
bootstrap_mean_edgeKS 
Numeric, mean KS statistic of the bootstrap trees' edge lengths. 
PedgeKS 
Numeric, the onesided statistical significance of the tree's edgelength KS statistic, i.e. the probability that the KS statistic of any tree generated by the model would be larger than the original tree's KS statistic. A low value means that the probability distribution of edge lengths in the original tree differs strongly from that expected based on the model. 
tree_nodeKS 
Numeric, KolmogorovSmirnov (KS) statistic of the original tree's node ages (divergence times), i.e. the estimated maximum difference between the tree's and the model's (estimated) cumulative distribution function of node ages. 
bootstrap_mean_nodeKS 
Numeric, mean KS statistic of the bootstrap trees' node ages. 
PnodeKS 
Numeric, the onesided statistical significance of the tree's nodeage KS statistic, i.e. the probability that the KS statistic of any tree generated by the model would be larger than the original tree's KS statistic. A low value means that the probability distribution of node ages in the original tree differs strongly from that expected based on the model. 
statistical_tests 
Data frame, listing the above statistical test results in a more compact format (one test statistic per row). 
LTT_ages 
Numeric vector, listing ages (time before present) on which the tree's LTT will be defined. 
tree_LTT 
Numeric vector of the same length as 
bootstrap_LTT_CI 
Named list containing the elements 
fraction_LTT_in_CI95 
Numeric, fraction of the tree's LTT contained within the equaltailed 95%confidence interval of the distribution of LTT values predicted by the model. For example, a value of 0.5 means that at half of the time points between the presentday and the root, the tree's LTT is contained with the 95%CI of predicted LTTs. 
Stilianos Louca
S. Louca and M. W. Pennell (2020). Extant timetrees are consistent with a myriad of diversification histories. Nature. 580:502505.
O. G. Pybus and P. H. Harvey (2000). Testing macroevolutionary models using incomplete molecular phylogenies. Proceedings of the Royal Society of London. Series B: Biological Sciences. 267:22672272.
M. J. Sackin (1972). "Good" and "Bad" Phenograms. Systematic Biology. 21:225226.
K.T. Shao, R. R. Sokal (1990). Tree Balance. Systematic Biology. 39:266276.
# generate a tree tree = castor::generate_tree_hbd_reverse(Ntips = 50, lambda = 1, mu = 0.5, rho = 1)$trees[[1]] root_age = castor::get_tree_span(tree)$max_distance # define & simulate a somewhat different BD model model = simulate_deterministic_hbd(LTT0 = 50, oldest_age = root_age, lambda = 1.5, mu = 0.5, rho0 = 1) # compare the tree to the model adequacy = model_adequacy_hbd(tree, models = model, Nbootstraps = 100, Nthreads = 2) if(!adequacy$success){ cat(sprintf("Adequacy test failed: %s\n",adequacy$error)) }else{ print(adequacy$statistical_tests) }