fit_sbm_const {castor} | R Documentation |

Given one or more rooted phylogenetic trees and geographic coordinates (latitudes & longitudes) for the tips of each tree, this function estimates the diffusivity of a Spherical Brownian Motion (SBM) model for the evolution of geographic location along lineages (Perrin 1928; Brillinger 2012). Estimation is done via maximum-likelihood and using independent contrasts between sister lineages.

fit_sbm_const(trees, tip_latitudes, tip_longitudes, radius, phylodistance_matrixes = NULL, clade_states = NULL, planar_approximation = FALSE, only_basal_tip_pairs = FALSE, only_distant_tip_pairs = FALSE, min_MRCA_time = 0, max_MRCA_age = Inf, max_phylodistance = Inf, no_state_transitions = FALSE, only_state = NULL, min_diffusivity = NULL, max_diffusivity = NULL, Nbootstraps = 0, NQQ = 0, SBM_PD_functor = NULL, focal_diffusivities = NULL)

`trees` |
Either a single rooted tree or a list of rooted trees, of class "phylo". The root of each tree is assumed to be the unique node with no incoming edge. Edge lengths are assumed to represent time intervals or a similarly interpretable phylogenetic distance. When multiple trees are provided, it is either assumed that their roots coincide in time (if |

`tip_latitudes` |
Numeric vector of length Ntips, or a list of vectors, listing latitudes of tips in decimal degrees (from -90 to 90). If |

`tip_longitudes` |
Numeric vector of length Ntips, or a list of vectors, listing longitudes of tips in decimal degrees (from -180 to 180). If |

`radius` |
Strictly positive numeric, specifying the radius of the sphere. For Earth, the mean radius is 6371 km. |

`phylodistance_matrixes` |
Numeric matrix, or a list of numeric matrixes, listing phylogenetic distances between tips for each tree. If |

`clade_states` |
Either NULL, or an integer vector of length Ntips+Nnodes, or a list of integer vectors, listing discrete states of every tip and node in the tree. If |

`planar_approximation` |
Logical, specifying whether to estimate the diffusivity based on a planar approximation of the SBM model, i.e. by assuming that geographic distances between tips are as if tips are distributed on a 2D cartesian plane. This approximation is only accurate if geographical distances between tips are small compared to the sphere's radius. |

`only_basal_tip_pairs` |
Logical, specifying whether to only compare immediate sister tips, i.e., tips connected through a single parental node. |

`only_distant_tip_pairs` |
Logical, specifying whether to only compare tips at distinct geographic locations. |

`min_MRCA_time` |
Numeric, specifying the minimum allowed time (distance from root) of the most recent common ancestor (MRCA) of sister tips considered in the fitting. In other words, an independent contrast is only considered if the two sister tips' MRCA has at least this distance from the root. Set |

`max_MRCA_age` |
Numeric, specifying the maximum allowed age (distance from youngest tip) of the MRCA of sister tips considered in the fitting. In other words, an independent contrast is only considered if the two sister tips' MRCA has at most this age (time to present). Set |

`max_phylodistance` |
Numeric, maximum allowed geodistance for an independent contrast to be included in the SBM fitting. Set |

`no_state_transitions` |
Logical, specifying whether to omit independent contrasts between tips whose shortest connecting paths include state transitions. If |

`only_state` |
Optional integer, specifying the state in which tip pairs (and their connecting ancestral nodes) must be in order to be considered. If specified, then |

`min_diffusivity` |
Non-negative numeric, specifying the minimum possible diffusivity. If NULL, this is automatically chosen. |

`max_diffusivity` |
Non-negative numeric, specifying the maximum possible diffusivity. If NULL, this is automatically chosen. |

`Nbootstraps` |
Non-negative integer, specifying an optional number of parametric bootstraps to performs for estimating standard errors and confidence intervals. |

`NQQ` |
Integer, optional number of simulations to perform for creating QQ plots of the theoretically expected distribution of geodistances vs. the empirical distribution of geodistances (across independent contrasts). The resolution of the returned QQ plot will be equal to the number of independent contrasts used for fitting. If <=0, no QQ plots will be calculated. |

`SBM_PD_functor` |
SBM probability density functor object. Used internally for efficiency and for debugging purposes, and should be kept at its default value |

`focal_diffusivities` |
Optional numeric vector, listing diffusivities of particular interest and for which the log-likelihoods should be returned. This may be used e.g. for diagnostic purposes, e.g. to see how "sharp" the likelihood peak is at the maximum-likelihood estimate. |

For short expected transition distances this function uses the approximation formula by Ghosh et al. (2012). For longer expected transition distances the function uses a truncated approximation of the series representation of SBM transition densities (Perrin 1928). It is assumed that tips are sampled randomly without any biases for certain geographic regions. If you suspect strong geographic sampling biases, consider using the function `fit_sbm_geobiased_const`

.

This function can use multiple trees to fit the diffusivity under the assumption that each tree is an independent realization of the same SBM process, i.e. all lineages in all trees dispersed with the same diffusivity.

If `edge.length`

is missing from one of the input trees, each edge in the tree is assumed to have length 1. The tree may include multifurcations as well as monofurcations, however multifurcations are internally expanded into bifurcations by adding dummy nodes.

A list with the following elements:

`success` |
Logical, indicating whether the fitting was successful. If |

`diffusivity` |
Numeric, the estimated diffusivity, in units distance^2/time. Distance units are the same as used for the |

`loglikelihood` |
Numeric, the log-likelihood of the data at the estimated diffusivity. |

`Ncontrasts` |
Integer, number of independent contrasts (i.e., tip pairs) used to estimate the diffusivity. This is the number of independent data points used. |

`phylodistances` |
Numeric vector of length |

`geodistances` |
Numeric vector of length |

`focal_loglikelihoods` |
Numeric vector of the same length as |

`standard_error` |
Numeric, estimated standard error of the estimated diffusivity, based on parametric bootstrapping. Only returned if |

`CI50lower` |
Numeric, lower bound of the 50% confidence interval for the estimated diffusivity (25-75% percentile), based on parametric bootstrapping. Only returned if |

`CI50upper` |
Numeric, upper bound of the 50% confidence interval for the estimated diffusivity, based on parametric bootstrapping. Only returned if |

`CI95lower` |
Numeric, lower bound of the 95% confidence interval for the estimated diffusivity (2.5-97.5% percentile), based on parametric bootstrapping. Only returned if |

`CI95upper` |
Numeric, upper bound of the 95% confidence interval for the estimated diffusivity, based on parametric bootstrapping. Only returned if |

`consistency` |
Numeric between 0 and 1, estimated consistency of the data with the fitted model. If |

`QQplot` |
Numeric matrix of size Ncontrasts x 2, listing the computed QQ-plot. The first column lists quantiles of geodistances in the original dataset, the 2nd column lists quantiles of hypothetical geodistances simulated based on the fitted model. |

`SBM_PD_functor` |
SBM probability density functor object. Used internally for efficiency and for debugging purposes. |

Stilianos Louca

F. Perrin (1928). Etude mathematique du mouvement Brownien de rotation. 45:1-51.

D. R. Brillinger (2012). A particle migrating randomly on a sphere. in Selected Works of David Brillinger. Springer.

A. Ghosh, J. Samuel, S. Sinha (2012). A Gaussian for diffusion on the sphere. Europhysics Letters. 98:30003.

A. Lindholm, D. Zachariah, P. Stoica, T. B. Schoen (2019). Data consistency approach to model validation. IEEE Access. 7:59788-59796.

S. Louca (2021). Phylogeographic estimation and simulation of global diffusive dispersal. Systematic Biology. 70:340-359.

`fit_sbm_geobiased_const`

,
`simulate_sbm`

,
`fit_sbm_parametric`

,
`fit_sbm_linear`

,
`fit_sbm_on_grid`

## Not run: # generate a random tree tree = generate_random_tree(list(birth_rate_intercept=1),max_tips=500)$tree # simulate SBM on the tree D = 1e4 simulation = simulate_sbm(tree, radius=6371, diffusivity=D) # fit SBM on the tree fit = fit_sbm_const(tree,simulation$tip_latitudes,simulation$tip_longitudes,radius=6371) cat(sprintf('True D=%g, fitted D=%g\n',D,fit$diffusivity)) ## End(Not run)

[Package *castor* version 1.6.8 Index]