fit_and_compare_sbm_const {castor} | R Documentation |

Given two rooted phylogenetic trees and geographic coordinates of the trees' tips, fit a Spherical Brownian Motion (SBM) model of diffusive geographic dispersal with constant diffusivity to each tree and compare the fitted models. This function estimates the diffusivity (*D*) for each data set (i.e., each set of trees + tip-coordinates) via maximum-likelihood and assesses whether the log-difference between the two fitted diffusivities is statistically significant, under the null hypothesis that the two data sets exhibit the same diffusivity. Optionally, multiple trees can be used as input for each data set, under the assumption that dispersal occurred according to the same diffusivity in each tree of that dataset. For more details on how SBM is fitted to each data set see the function `fit_sbm_const`

.

fit_and_compare_sbm_const( trees1, tip_latitudes1, tip_longitudes1, trees2, tip_latitudes2, tip_longitudes2, radius, planar_approximation = FALSE, only_basal_tip_pairs = FALSE, only_distant_tip_pairs = FALSE, min_MRCA_time = 0, max_MRCA_age = Inf, max_phylodistance = Inf, min_diffusivity = NULL, max_diffusivity = NULL, Nbootstraps = 0, Nsignificance = 0, SBM_PD_functor = NULL, verbose = FALSE, verbose_prefix = "")

`trees1` |
Either a single rooted tree or a list of rooted trees, of class "phylo", corresponding to the first data set on which an SBM model is to be fitted. Edge lengths are assumed to represent time intervals or a similarly interpretable phylogenetic distance. |

`tip_latitudes1` |
Numeric vector listing the latitude (in decimal degrees) of each tip in each tree in the first data set. If |

`tip_longitudes1` |
Similar to |

`trees2` |
Either a single rooted tree or a list of rooted trees, of class "phylo", corresponding to the second data set on which an SBM model is to be fitted. Edge lengths are assumed to represent time intervals or a similarly interpretable phylogenetic distance. |

`tip_latitudes2` |
Numeric vector listing the latitude (in decimal degrees) of each tip in each tree in the second data set, similarly to |

`tip_longitudes2` |
Numeric vector listing the longitude (in decimal degrees) of each tip in each tree in the second data set, similarly to |

`radius` |
Strictly positive numeric, specifying the radius of the sphere. For Earth, the mean radius is 6371 km. |

`planar_approximation` |
Logical, specifying whether to estimate the diffusivity based on a planar approximation of the SBM model, i.e. by assuming that geographic distances between tips are as if tips are distributed on a 2D cartesian plane. This approximation is only accurate if geographical distances between tips are small compared to the sphere's radius. |

`only_basal_tip_pairs` |
Logical, specifying whether to only compare immediate sister tips, i.e., tips connected through a single parental node. |

`only_distant_tip_pairs` |
Logical, specifying whether to only compare tips at distinct geographic locations. |

`min_MRCA_time` |
Numeric, specifying the minimum allowed time (distance from root) of the most recent common ancestor (MRCA) of sister tips considered in the fitting. In other words, an independent contrast is only considered if the two sister tips' MRCA has at least this distance from the root. Set |

`max_MRCA_age` |
Numeric, specifying the maximum allowed age (distance from youngest tip) of the MRCA of sister tips considered in the fitting. In other words, an independent contrast is only considered if the two sister tips' MRCA has at most this age (time to present). Set |

`max_phylodistance` |
Numeric, maximum allowed geodistance for an independent contrast to be included in the SBM fitting. Set |

`min_diffusivity` |
Non-negative numeric, specifying the minimum possible diffusivity. If NULL, this is automatically chosen. |

`max_diffusivity` |
Non-negative numeric, specifying the maximum possible diffusivity. If NULL, this is automatically chosen. |

`Nbootstraps` |
Integer, specifying the number of parametric bootstraps to perform for calculating the confidence intervals of SBM diffusivities fitted to each data set. If <=0, no bootstrapping is performed. |

`Nsignificance` |
Integer, specifying the number of simulations to perform for assessing the statistical significance of the linear difference and log-transformed difference between the diffusivities fitted to the two data sets, i.e. of |

`SBM_PD_functor` |
SBM probability density functor object. Used internally and for debugging purposes. Unless you know what you're doing, you should keep this |

`verbose` |
Logical, specifying whether to print progress report messages to the screen. |

`verbose_prefix` |
Character, specifying a prefix to include in front of progress report messages on each line. Only relevant if |

For details on the Spherical Brownian Motion model see `fit_sbm_const`

and `simulate_sbm`

. This function separately fits an SBM model with constant diffusivity to each of two data sets; internally, this function applies `fit_sbm_const`

to each data set.

If `Nsignificance>0`

, the statistical significance of the linear difference (*|D_1-D_2|*) and log-transformed difference (*|\log(D_1)-\log(D_2)|*) of the two fitted diffusivities is assessed under the null hypothesis that both data sets were generated by the same common SBM model. The diffusivity of this common SBM model is estimated by fitting to both datasets at once, i.e. after merging the two datasets into a single dataset of trees and tip coordinates (see return variable `fit_common`

below). For each of the `Nsignificance`

random simulations of the common SBM model on the two tree sets, the diffusivities are again separately fitted on the two simulated sets and the resulting difference and log-difference is compared to those of the original data sets. The returned `lin_significance`

(or `log_significance`

) is the probability that the diffusivities would have a difference (or log-difference) larger than the observed one, if the two data sets had been generated under the common SBM model.

If `edge.length`

is missing from one of the input trees, each edge in the tree is assumed to have length 1. Trees may include multifurcations as well as monofurcations, however multifurcations are internally expanded into bifurcations by adding dummy nodes.

A list with the following elements:

`success` |
Logical, indicating whether the fitting was successful for both data sets. If |

`fit1` |
A named list containing the fitting results for the first data set, in the same format as returned by |

`fit2` |
A named list containing the fitting results for the second data set, in the same format as returned by |

`lin_difference` |
The absolute difference between the two diffusivities, i.e. |

`log_difference` |
The absolute difference between the two log-transformed diffusivities, i.e. |

`lin_significance` |
Numeric, statistical significance of the observed lin-difference under the null hypothesis that the two data sets were generated by a common SBM model. Only returned if |

`log_significance` |
Numeric, statistical significance of the observed log-difference under the null hypothesis that the two data sets were generated by a common SBM model. Only returned if |

`fit_common` |
A named list containing the fitting results for the two data sets combined, in the same format as returned by |

Stilianos Louca

S. Louca (in review as of 2020). Phylogeographic estimation and simulation of global diffusive dispersal. Systematic Biology.

`simulate_sbm`

,
`fit_sbm_const`

,
`fit_sbm_linear`

,
`fit_sbm_parametric`

## Not run: # simulate distinct SBM models on two random trees radius = 6371 # Earth's radius D1 = 1 # diffusivity on 1st tree D2 = 3 # diffusivity on 2nd tree tree1 = generate_random_tree(list(birth_rate_factor=1),max_tips=100)$tree tree2 = generate_random_tree(list(birth_rate_factor=1),max_tips=100)$tree sim1 = simulate_sbm(tree=tree1, radius=radius, diffusivity=D1) sim2 = simulate_sbm(tree=tree2, radius=radius, diffusivity=D2) tip_latitudes1 = sim1$tip_latitudes tip_longitudes1 = sim1$tip_longitudes tip_latitudes2 = sim2$tip_latitudes tip_longitudes2 = sim2$tip_longitudes # fit and compare SBM models between the two hypothetical data sets fit = fit_and_compare_sbm_const(trees1 = tree1, tip_latitudes1 = tip_latitudes1, tip_longitudes1 = tip_longitudes1, trees2 = tree2, tip_latitudes2 = tip_latitudes2, tip_longitudes2 = tip_longitudes2, radius = radius, Nbootstraps = 0, Nsignificance = 100) # print summary of results cat(sprintf("Fitted D1 = %g, D2 = %g, significance of log-diff. = %g\n", fit$fit1$diffusivity, fit$fit2$diffusivity, fit$log_significance)) ## End(Not run)

[Package *castor* version 1.6.8 Index]