fit_hbd_model_on_grid {castor} | R Documentation |

Given an ultrametric timetree, fit a homogenous birth-death (HBD) model in which speciation and extinction rates (*λ* and *mu*) are defined on a fixed grid of discrete time points and assumed to vary polynomially between grid points. “Homogenous” refers to the assumption that, at any given moment in time, all lineages exhibit the same speciation/extinction rates (in the literature this is sometimes referred to simply as “birth-death model”). Every HBD model is defined based on the values that *λ* and *μ* take over time as well as the sampling fraction *ρ* (fraction of extant species sampled). This function estimates the values of *λ* and *μ* at each grid point by maximizing the likelihood (Morlon et al. 2011) of the timetree under the resulting HBD model.

fit_hbd_model_on_grid(tree, oldest_age = NULL, age0 = 0, age_grid = NULL, min_lambda = 0, max_lambda = +Inf, min_mu = 0, max_mu = +Inf, min_rho0 = 1e-10, max_rho0 = 1, guess_lambda = NULL, guess_mu = NULL, guess_rho0 = 1, fixed_lambda = NULL, fixed_mu = NULL, fixed_rho0 = NULL, const_lambda = FALSE, const_mu = FALSE, splines_degree = 1, condition = "auto", relative_dt = 1e-3, Ntrials = 1, Nthreads = 1, max_model_runtime = NULL, fit_control = list())

`tree` |
An ultrametric timetree of class "phylo", representing the time-calibrated reconstructed phylogeny of a set of extant species. |

`oldest_age` |
Strictly positive numeric, specifying the oldest time before present (“age”) to consider when calculating the likelihood. If this is equal to or greater than the root age, then |

`age0` |
Non-negative numeric, specifying the youngest age (time before present) to consider for fitting, and with respect to which |

`age_grid` |
Numeric vector, listing ages in ascending order, on which |

`min_lambda` |
Numeric vector of length Ngrid (= |

`max_lambda` |
Numeric vector of length Ngrid, or a single numeric, specifying upper bounds for the fitted |

`min_mu` |
Numeric vector of length Ngrid, or a single numeric, specifying lower bounds for the fitted |

`max_mu` |
Numeric vector of length Ngrid, or a single numeric, specifying upper bounds for the fitted |

`min_rho0` |
Numeric, specifying a lower bound for the fitted sampling fraction |

`max_rho0` |
Numeric, specifying an upper bound for the fitted sampling fraction |

`guess_lambda` |
Initial guess for |

`guess_mu` |
Initial guess for |

`guess_rho0` |
Numeric, specifying an initial guess for the sampling fraction |

`fixed_lambda` |
Optional fixed (i.e. non-fitted) |

`fixed_mu` |
Optional fixed (i.e. non-fitted) |

`fixed_rho0` |
Numeric between 0 and 1, optionallly specifying a fixed value for the sampling fraction |

`const_lambda` |
Logical, specifying whether |

`const_mu` |
Logical, specifying whether |

`splines_degree` |
Integer between 0 and 3 (inclusive), specifying the polynomial degree of |

`condition` |
Character, either "crown", "stem", "none" or "auto", specifying on what to condition the likelihood. If "crown", the likelihood is conditioned on the survival of the two daughter lineages branching off at the root. If "stem", the likelihood is conditioned on the survival of the stem lineage. Note that "crown" really only makes sense when |

`relative_dt` |
Strictly positive numeric (unitless), specifying the maximum relative time step allowed for integration over time, when calculating the likelihood. Smaller values increase integration accuracy but increase computation time. Typical values are 0.0001-0.001. The default is usually sufficient. |

`Ntrials` |
Integer, specifying the number of independent fitting trials to perform, each starting from a random choice of model parameters. Increasing |

`Nthreads` |
Integer, specifying the number of parallel threads to use for performing multiple fitting trials simultaneously. This should generally not exceed the number of available CPUs on your machine. Parallel computing is not available on the Windows platform. |

`max_model_runtime` |
Optional numeric, specifying the maximum number of seconds to allow for each evaluation of the likelihood function. Use this to abort fitting trials leading to parameter regions where the likelihood takes a long time to evaluate (these are often unlikely parameter regions). |

`fit_control` |
Named list containing options for the |

Warning: Unless well-justified constraints are imposed on either *λ* and/or *μ* and *ρ*, it is generally impossible to reliably estimate *λ* and *μ* from extant timetrees alone (Louca and Pennell, 2019). This routine (and any other software that claims to estimate *λ* and *μ* solely from extant timetrees) should thus be used with great suspicion. If your only source of information is an extant timetree, and you have no a priori information on how *λ* or *μ* might have looked like, you should consider using the more appropriate routines `fit_hbd_pdr_on_grid`

and `fit_hbd_psr_on_grid`

instead.

If `age0>0`

, the input tree is essentially trimmed at `age0`

(omitting anything younger than `age0`

), and the various variables are fitted to this new (shorter) tree, with time shifted appropriately. For example, the fitted `rho0`

is thus the sampling fraction at `age0`

, i.e. the fraction of lineages extant at `age0`

that are represented in the timetree.

It is generally advised to provide as much information to the function `fit_hbd_model_on_grid`

as possible, including reasonable lower and upper bounds (`min_lambda`

, `max_lambda`

, `min_mu`

, `max_mu`

, `min_rho0`

and `max_rho0`

) and a reasonable parameter guess (`guess_lambda`

, `guess_mu`

and `guess_rho0`

). It is also important that the `age_grid`

is sufficiently fine to capture the expected major variations of *λ* and *μ* over time, but keep in mind the serious risk of overfitting when `age_grid`

is too fine and/or the tree is too small.

A list with the following elements:

`success` |
Logical, indicating whether model fitting succeeded. If |

`objective_value` |
The maximized fitting objective. Currently, only maximum-likelihood estimation is implemented, and hence this will always be the maximized log-likelihood. |

`objective_name` |
The name of the objective that was maximized during fitting. Currently, only maximum-likelihood estimation is implemented, and hence this will always be “loglikelihood”. |

`loglikelihood` |
The log-likelihood of the fitted model for the given timetree. |

`fitted_lambda` |
Numeric vector of size Ngrid, listing fitted or fixed speciation rates |

`fitted_mu` |
Numeric vector of size Ngrid, listing fitted or fixed extinction rates |

`fitted_rho` |
Numeric, specifying the fitted or fixed sampling fraction |

`guess_lambda` |
Numeric vector of size Ngrid, specifying the initial guess for |

`guess_mu` |
Numeric vector of size Ngrid, specifying the initial guess for |

`guess_rho0` |
Numeric, specifying the initial guess for |

`age_grid` |
The age-grid on which |

`NFP` |
Integer, number of free (i.e., independently) fitted parameters. If none of the |

`AIC` |
The Akaike Information Criterion for the fitted model, defined as |

`BIC` |
The Bayesian information criterion for the fitted model, defined as |

`condition` |
Character, specifying what conditioning was root for the likelihood (e.g. "crown" or "stem"). |

`converged` |
Logical, specifying whether the maximum likelihood was reached after convergence of the optimization algorithm. Note that in some cases the maximum likelihood may have been achieved by an optimization path that did not yet converge (in which case it's advisable to increase |

`Niterations` |
Integer, specifying the number of iterations performed during the optimization path that yielded the maximum likelihood. |

`Nevaluations` |
Integer, specifying the number of likelihood evaluations performed during the optimization path that yielded the maximum likelihood. |

Stilianos Louca

T. Stadler (2009). On incomplete sampling under birth-death models and connections to the sampling-based coalescent. Journal of Theoretical Biology. 261:58-66.

T. Stadler (2013). How can we improve accuracy of macroevolutionary rate estimates? Systematic Biology. 62:321-329.

H. Morlon, T. L. Parsons, J. B. Plotkin (2011). Reconciling molecular phylogenies with the fossil record. Proceedings of the National Academy of Sciences. 108:16327-16332.

S. Louca et al. (2018). Bacterial diversification through geological time. Nature Ecology & Evolution. 2:1458-1467.

## Not run: # Generate a random tree with exponentially varying lambda & mu Ntips = 10000 rho = 0.5 # sampling fraction time_grid = seq(from=0, to=100, by=0.01) lambdas = 2*exp(0.1*time_grid) mus = 1.5*exp(0.09*time_grid) sim = generate_random_tree( parameters = list(rarefaction=rho), max_tips = Ntips/rho, coalescent = TRUE, added_rates_times = time_grid, added_birth_rates_pc = lambdas, added_death_rates_pc = mus) tree = sim$tree root_age = castor::get_tree_span(tree)$max_distance cat(sprintf("Tree has %d tips, spans %g Myr\n",length(tree$tip.label),root_age)) # Fit mu on grid # Assume that lambda & rho are known Ngrid = 5 age_grid = seq(from=0,to=root_age,length.out=Ngrid) fit = fit_hbd_model_on_grid(tree, age_grid = age_grid, max_mu = 100, fixed_lambda= approx(x=time_grid,y=lambdas,xout=sim$final_time-age_grid)$y, fixed_rho0 = rho, condition = "crown", Ntrials = 10, # perform 10 fitting trials Nthreads = 2, # use two CPUs max_model_runtime = 1) # limit model evaluation to 1 second if(!fit$success){ cat(sprintf("ERROR: Fitting failed: %s\n",fit$error)) }else{ cat(sprintf("Fitting succeeded:\nLoglikelihood=%g\n",fit$loglikelihood)) # plot fitted & true mu plot( x = fit$age_grid, y = fit$fitted_mu, main = 'Fitted & true mu', xlab = 'age', ylab = 'mu', type = 'b', col = 'red', xlim = c(root_age,0)) lines(x = sim$final_time-time_grid, y = mus, type = 'l', col = 'blue'); # get fitted mu as a function of age mu_fun = approxfun(x=fit$age_grid, y=fit$fitted_mu) } ## End(Not run)

[Package *castor* version 1.6.7 Index]