fit_and_compare_bm_models {castor} | R Documentation |
Fit and compare Brownian Motion models for multivariate trait evolution between two data sets.
Description
Given two rooted phylogenetic trees and states of one or more continuous (numeric) traits on the trees' tips, fit a multivariate Brownian motion model of correlated evolution to each data set and compare the fitted models. This function estimates the diffusivity matrix for each data set (i.e., each tree/tip-states set) via maximum-likelihood and assesses whether the log-difference between the two fitted diffusivity matrixes is statistically significant, under the null hypothesis that the two data sets exhibit the same diffusivity. Optionally, multiple trees can be used as input for each data set, under the assumption that the trait evolved on each tree according to the same BM model. For more details on how BM is fitted to each data set see the function fit_bm_model
.
Usage
fit_and_compare_bm_models( trees1,
tip_states1,
trees2,
tip_states2,
Nbootstraps = 0,
Nsignificance = 0,
check_input = TRUE,
verbose = FALSE,
verbose_prefix = "")
Arguments
trees1 |
Either a single rooted tree or a list of rooted trees, of class "phylo", corresponding to the first data set on which a BM model is to be fitted. Edge lengths are assumed to represent time intervals or a similarly interpretable phylogenetic distance. |
tip_states1 |
Numeric state of each trait at each tip in each tree in the first data set. If |
trees2 |
Either a single rooted tree or a list of rooted trees, of class "phylo", corresponding to the second data set on which a BM model is to be fitted. Edge lengths are assumed to represent time intervals or a similarly interpretable phylogenetic distance. |
tip_states2 |
Numeric state of each trait at each tip in each tree in the second data set, similarly to |
Nbootstraps |
Integer, specifying the number of parametric bootstraps to perform for calculating the confidence intervals of BM diffusivities fitted to each data set. If <=0, no bootstrapping is performed. |
Nsignificance |
Integer, specifying the number of simulations to perform for assessing the statistical significance of the log-transformed difference between the diffusivities fitted to the two data sets, i.e. of |
check_input |
Logical, specifying whether to perform some basic checks on the validity of the input data. If you are certain that your input data are valid, you can set this to |
verbose |
Logical, specifying whether to print progress report messages to the screen. |
verbose_prefix |
Character, specifying a prefix to include in front of progress report messages on each line. Only relevant if |
Details
For details on the Brownian Motion model see fit_bm_model
and simulate_bm_model
. This function separately fits a single-variate or multi-variate BM model with constant diffusivity (diffusivity matrix, in the multivariate case) to each data set; internally, this function applies fit_bm_model
to each data set.
If Nsignificance>0
, the statistical significance of the log-transformed difference of the two fitted diffusivity matrixes, |\log(D_1)-\log(D_2)|
, is assessed, under the null hypothesis that both data sets were generated by the same common BM model. The diffusivity of this common BM model is estimated by fitting to both datasets at once, i.e. after merging the two datasets into a single dataset of trees and tip states (see return variable fit_common
below). For each of the Nsignificance
random simulations of the common BM model on the two tree sets, the diffusivities are again separately fitted on the two simulated sets and the resulting log-difference is compared to the one of the original data sets. The returned significance
is the probability that the diffusivities would have a log-difference larger than the observed one, if the two data sets had been generated under the common BM model.
If tree$edge.length
is missing, each edge in the tree is assumed to have length 1. The tree may include multifurcations (i.e. nodes with more than 2 children) as well as monofurcations (i.e. nodes with only one child). Note that multifurcations are internally expanded to bifurcations, prior to model fitting.
Value
A list with the following elements:
success |
Logical, indicating whether the fitting was successful for both data sets. If |
fit1 |
A named list containing the fitting results for the first data set, in the same format as returned by |
fit2 |
A named list containing the fitting results for the second data set, in the same format as returned by |
log_difference |
The absolute difference between the log-transformed diffusivities, i.e. |
significance |
Numeric, statistical significance of the observed log-difference under the null hypothesis that the two data sets were generated by a common BM model. Only returned if |
fit_common |
A named list containing the fitting results for the two data sets combined, in the same format as returned by |
Author(s)
Stilianos Louca
References
J. Felsenstein (1985). Phylogenies and the Comparative Method. The American Naturalist. 125:1-15.
See Also
simulate_bm_model
,
fit_bm_model
,
get_independent_contrasts
Examples
# simulate distinct BM models on two random trees
D1 = 1
D2 = 2
tree1 = generate_random_tree(list(birth_rate_factor=1),max_tips=100)$tree
tree2 = generate_random_tree(list(birth_rate_factor=1),max_tips=100)$tree
tip_states1 = simulate_bm_model(tree1, diffusivity = D1)$tip_states
tip_states2 = simulate_bm_model(tree2, diffusivity = D2)$tip_states
# fit and compare BM models between the two data sets
fit = fit_and_compare_bm_models(trees1 = tree1,
tip_states1 = tip_states1,
trees2 = tree2,
tip_states2 = tip_states2,
Nbootstraps = 100,
Nsignificance = 100)
# print summary of results
cat(sprintf("Fitted D1 = %g, D2 = %g, significance of log-diff. = %g\n",
fit$fit1$diffusivity, fit$fit2$diffusivity, fit$significance))