R: Fit and compare Spherical Brownian Motion models for...

fit_and_compare_sbm_const {castor}

R Documentation

Fit and compare Spherical Brownian Motion models for diffusive geographic dispersal between two data sets.

Description

Given two rooted phylogenetic trees and geographic coordinates of the trees' tips, fit a Spherical Brownian Motion (SBM) model of diffusive geographic dispersal with constant diffusivity to each tree and compare the fitted models. This function estimates the diffusivity (D) for each data set (i.e., each set of trees + tip-coordinates) via maximum-likelihood and assesses whether the log-difference between the two fitted diffusivities is statistically significant, under the null hypothesis that the two data sets exhibit the same diffusivity. Optionally, multiple trees can be used as input for each data set, under the assumption that dispersal occurred according to the same diffusivity in each tree of that dataset. For more details on how SBM is fitted to each data set see the function fit_sbm_const.

Usage

fit_and_compare_sbm_const(  trees1, 
                            tip_latitudes1,
                            tip_longitudes1,
                            trees2,
                            tip_latitudes2,
                            tip_longitudes2,
                            radius,
                            planar_approximation    = FALSE,
                            only_basal_tip_pairs    = FALSE,
                            only_distant_tip_pairs  = FALSE,
                            min_MRCA_time           = 0,
                            max_MRCA_age            = Inf,
                            max_phylodistance       = Inf,
                            min_diffusivity         = NULL,
                            max_diffusivity         = NULL,
                            Nbootstraps             = 0,
                            Nsignificance           = 0,
                            SBM_PD_functor          = NULL,
                            verbose                 = FALSE,
                            verbose_prefix          = "")

Arguments

`trees1`	Either a single rooted tree or a list of rooted trees, of class "phylo", corresponding to the first data set on which an SBM model is to be fitted. Edge lengths are assumed to represent time intervals or a similarly interpretable phylogenetic distance.
`tip_latitudes1`	Numeric vector listing the latitude (in decimal degrees) of each tip in each tree in the first data set. If `trees1` is a single tree, then `tip_latitudes1` must be a numeric vector of size Ntips, listing the latitudes for each tip in the tree. If `trees1` is a list of Ntrees trees, then `tip_latitudes1` must be a list of length Ntrees, each element of which lists the latitudes for the corresponding tree (as a vector, similarly to the single-tree case).
`tip_longitudes1`	Similar to `tip_latitudes1`, but listing longitudes (in decimal degrees) of each tip in each tree in the first data set.
`trees2`	Either a single rooted tree or a list of rooted trees, of class "phylo", corresponding to the second data set on which an SBM model is to be fitted. Edge lengths are assumed to represent time intervals or a similarly interpretable phylogenetic distance.
`tip_latitudes2`	Numeric vector listing the latitude (in decimal degrees) of each tip in each tree in the second data set, similarly to `tip_latitudes1`.
`tip_longitudes2`	Numeric vector listing the longitude (in decimal degrees) of each tip in each tree in the second data set, similarly to `tip_longitudes1`.
`radius`	Strictly positive numeric, specifying the radius of the sphere. For Earth, the mean radius is 6371 km.
`planar_approximation`	Logical, specifying whether to estimate the diffusivity based on a planar approximation of the SBM model, i.e. by assuming that geographic distances between tips are as if tips are distributed on a 2D cartesian plane. This approximation is only accurate if geographical distances between tips are small compared to the sphere's radius.
`only_basal_tip_pairs`	Logical, specifying whether to only compare immediate sister tips, i.e., tips connected through a single parental node.
`only_distant_tip_pairs`	Logical, specifying whether to only compare tips at distinct geographic locations.
`min_MRCA_time`	Numeric, specifying the minimum allowed time (distance from root) of the most recent common ancestor (MRCA) of sister tips considered in the fitting. In other words, an independent contrast is only considered if the two sister tips' MRCA has at least this distance from the root. Set `min_MRCA_time<=0` to disable this filter.
`max_MRCA_age`	Numeric, specifying the maximum allowed age (distance from youngest tip) of the MRCA of sister tips considered in the fitting. In other words, an independent contrast is only considered if the two sister tips' MRCA has at most this age (time to present). Set `max_MRCA_age=Inf` to disable this filter.
`max_phylodistance`	Numeric, maximum allowed geodistance for an independent contrast to be included in the SBM fitting. Set `max_phylodistance=Inf` to disable this filter.
`min_diffusivity`	Non-negative numeric, specifying the minimum possible diffusivity. If NULL, this is automatically chosen.
`max_diffusivity`	Non-negative numeric, specifying the maximum possible diffusivity. If NULL, this is automatically chosen.
`Nbootstraps`	Integer, specifying the number of parametric bootstraps to perform for calculating the confidence intervals of SBM diffusivities fitted to each data set. If <=0, no bootstrapping is performed.
`Nsignificance`	Integer, specifying the number of simulations to perform for assessing the statistical significance of the linear difference and log-transformed difference between the diffusivities fitted to the two data sets, i.e. of `\|D_1-D_2\|` and of `\|\log(D_1)-\log(D_2)\|`. Set to 0 to not calculate statistical significances. See below for additional details.
`SBM_PD_functor`	SBM probability density functor object. Used internally and for debugging purposes. Unless you know what you're doing, you should keep this `NULL`.
`verbose`	Logical, specifying whether to print progress report messages to the screen.
`verbose_prefix`	Character, specifying a prefix to include in front of progress report messages on each line. Only relevant if `verbose==TRUE`.

Details

For details on the Spherical Brownian Motion model see fit_sbm_const and simulate_sbm. This function separately fits an SBM model with constant diffusivity to each of two data sets; internally, this function applies fit_sbm_const to each data set.

If Nsignificance>0, the statistical significance of the linear difference (|D_1-D_2|) and log-transformed difference (|\log(D_1)-\log(D_2)|) of the two fitted diffusivities is assessed under the null hypothesis that both data sets were generated by the same common SBM model. The diffusivity of this common SBM model is estimated by fitting to both datasets at once, i.e. after merging the two datasets into a single dataset of trees and tip coordinates (see return variable fit_common below). For each of the Nsignificance random simulations of the common SBM model on the two tree sets, the diffusivities are again separately fitted on the two simulated sets and the resulting difference and log-difference is compared to those of the original data sets. The returned lin_significance (or log_significance) is the probability that the diffusivities would have a difference (or log-difference) larger than the observed one, if the two data sets had been generated under the common SBM model.

If edge.length is missing from one of the input trees, each edge in the tree is assumed to have length 1. Trees may include multifurcations as well as monofurcations, however multifurcations are internally expanded into bifurcations by adding dummy nodes.

Value

A list with the following elements:

`success`	Logical, indicating whether the fitting was successful for both data sets. If `FALSE`, then an additional return variable, `error`, will contain a description of the error; in that case all other return variables may be undefined.
`fit1`	A named list containing the fitting results for the first data set, in the same format as returned by `fit_sbm_const`. In particular, the diffusivity fitted to the first data set will be stored in `fit1$diffusivity`.
`fit2`	A named list containing the fitting results for the second data set, in the same format as returned by `fit_sbm_const`. In particular, the diffusivity fitted to the second data set will be stored in `fit2$diffusivity`.
`lin_difference`	The absolute difference between the two diffusivities, i.e. `\|D_1-D_2\|`.
`log_difference`	The absolute difference between the two log-transformed diffusivities, i.e. `\|\log(D_1)-\log(D_2)\|`.
`lin_significance`	Numeric, statistical significance of the observed lin-difference under the null hypothesis that the two data sets were generated by a common SBM model. Only returned if `Nsignificance>0`.
`log_significance`	Numeric, statistical significance of the observed log-difference under the null hypothesis that the two data sets were generated by a common SBM model. Only returned if `Nsignificance>0`.
`fit_common`	A named list containing the fitting results for the two data sets combined, in the same format as returned by `fit_sbm_const`. The common diffusivity, `fit_common$diffusivity` is used for the random simulations when assessing the statistical significance of the lin-difference and log-difference of the separately fitted diffusivities. Only returned if `Nsignificance>0`.

Author(s)

Stilianos Louca

References

S. Louca (in review as of 2020). Phylogeographic estimation and simulation of global diffusive dispersal. Systematic Biology.

Examples

## Not run: 
# simulate distinct SBM models on two random trees
radius = 6371   # Earth's radius
D1     = 1      # diffusivity on 1st tree
D2     = 3      # diffusivity on 2nd tree
tree1  = generate_random_tree(list(birth_rate_factor=1),max_tips=100)$tree
tree2  = generate_random_tree(list(birth_rate_factor=1),max_tips=100)$tree
sim1   = simulate_sbm(tree=tree1, radius=radius, diffusivity=D1)
sim2   = simulate_sbm(tree=tree2, radius=radius, diffusivity=D2)
tip_latitudes1  = sim1$tip_latitudes
tip_longitudes1 = sim1$tip_longitudes
tip_latitudes2  = sim2$tip_latitudes
tip_longitudes2 = sim2$tip_longitudes

# fit and compare SBM models between the two hypothetical data sets
fit = fit_and_compare_sbm_const(trees1          = tree1, 
                                tip_latitudes1  = tip_latitudes1, 
                                tip_longitudes1 = tip_longitudes1, 
                                trees2          = tree2,
                                tip_latitudes2  = tip_latitudes2, 
                                tip_longitudes2 = tip_longitudes2, 
                                radius          = radius,
                                Nbootstraps     = 0,
                                Nsignificance   = 100)

# print summary of results
cat(sprintf("Fitted D1 = %g, D2 = %g, significance of log-diff. = %g\n",
            fit$fit1$diffusivity, fit$fit2$diffusivity, fit$log_significance))

## End(Not run)

[Package castor version 1.8.2 Index]