fit_hbd_pdr_on_grid {castor} | R Documentation |

Given an ultrametric timetree, estimate the pulled diversification rate of homogenous birth-death (HBD) models that best explains the tree via maximum likelihood. Every HBD model is defined by some speciation and extinction rates (*λ* and *μ*) over time, as well as the sampling fraction *ρ* (fraction of extant species sampled). “Homogenous” refers to the assumption that, at any given moment in time, all lineages exhibit the same speciation/extinction rates. For any given HBD model there exists an infinite number of alternative HBD models that predict the same deterministic lineages-through-time curve and yield the same likelihood for any given reconstructed timetree; these “congruent” models cannot be distinguished from one another solely based on the tree.

Each congruence class is uniquely described by the “pulled diversification rate” (PDR; Louca et al 2018), defined as *PDR=λ-μ+λ^{-1}dλ/dτ* (where *τ* is time before present) as well as the product *ρλ_o* (where *λ_o* is the present-day speciation rate). That is, two HBD models are congruent if and only if they have the same PDR and the same product *ρλ_o*. This function is designed to estimate the generating congruence class for the tree, by fitting the PDR on a grid of discrete times as well as the product *ρλ_o*.

fit_hbd_pdr_on_grid( tree, oldest_age = NULL, age0 = 0, age_grid = NULL, min_PDR = -Inf, max_PDR = +Inf, min_rholambda0 = 1e-10, max_rholambda0 = +Inf, guess_PDR = NULL, guess_rholambda0 = NULL, fixed_PDR = NULL, fixed_rholambda0 = NULL, splines_degree = 1, condition = "auto", relative_dt = 1e-3, Ntrials = 1, Nbootstraps = 0, Ntrials_per_bootstrap = NULL, Nthreads = 1, max_model_runtime = NULL, fit_control = list(), verbose = FALSE, verbose_prefix = "")

`tree` |
An ultrametric timetree of class "phylo", representing the time-calibrated phylogeny of a set of extant species. |

`oldest_age` |
Strictly positive numeric, specifying the oldest time before present (“age”) to consider when calculating the likelihood. If this is equal to or greater than the root age, then |

`age0` |
Non-negative numeric, specifying the youngest age (time before present) to consider for fitting, and with respect to which |

`age_grid` |
Numeric vector, listing ages in ascending order at which the PDR is allowed to vary independently. This grid must cover at least the age range from |

`min_PDR` |
Numeric vector of length Ngrid (= |

`max_PDR` |
Numeric vector of length Ngrid, or a single numeric, specifying upper bounds for the fitted PDR at each point in the age grid. If a single numeric, the same upper bound applies at all ages. Use |

`min_rholambda0` |
Strictly positive numeric, specifying the lower bound for the fitted |

`max_rholambda0` |
Strictly positive numeric, specifying the upper bound for the fitted |

`guess_PDR` |
Initial guess for the PDR at each age-grid point. Either |

`guess_rholambda0` |
Numeric, specifying an initial guess for the product |

`fixed_PDR` |
Optional fixed (i.e. non-fitted) PDR values on one or more age-grid points. Either |

`fixed_rholambda0` |
Numeric, optionally specifying a fixed value for the product |

`splines_degree` |
Integer between 0 and 3 (inclusive), specifying the polynomial degree of the PDR between age-grid points. If 0, then the PDR is considered piecewise constant, if 1 then the PDR is considered piecewise linear, if 2 or 3 then the PDR is considered to be a spline of degree 2 or 3, respectively. The |

`condition` |
Character, either "crown", "stem" or "auto", specifying on what to condition the likelihood. If "crown", the likelihood is conditioned on the survival of the two daughter lineages branching off at the root. If "stem", the likelihood is conditioned on the survival of the stem lineage. Note that "crown" really only makes sense when |

`relative_dt` |
Strictly positive numeric (unitless), specifying the maximum relative time step allowed for integration over time, when calculating the likelihood. Smaller values increase integration accuracy but increase computation time. Typical values are 0.0001-0.001. The default is usually sufficient. |

`Ntrials` |
Integer, specifying the number of independent fitting trials to perform, each starting from a random choice of model parameters. Increasing |

`Nbootstraps` |
Integer, specifying an optional number of bootstrap samplings to perform, for estimating standard errors and confidence intervals of maximum-likelihood fitted parameters. If 0, no bootstrapping is performed. Typical values are 10-100. At each bootstrap sampling, a random timetree is generated under the birth-death model according to the fitted PDR and |

`Ntrials_per_bootstrap` |
Integer, specifying the number of fitting trials to perform for each bootstrap sampling. If |

`Nthreads` |
Integer, specifying the number of parallel threads to use for performing multiple fitting trials simultaneously. This should generally not exceed the number of available CPUs on your machine. Parallel computing is not available on the Windows platform. |

`max_model_runtime` |
Optional numeric, specifying the maximum number of seconds to allow for each evaluation of the likelihood function. Use this to abort fitting trials leading to parameter regions where the likelihood takes a long time to evaluate (these are often unlikely parameter regions). |

`fit_control` |
Named list containing options for the |

`verbose` |
Logical, specifying whether to print progress reports and warnings to the screen. Note that errors always cause a return of the function (see return values |

`verbose_prefix` |
Character, specifying the line prefix for printing progress reports to the screen. |

If `age0>0`

, the input tree is essentially trimmed at `age0`

(omitting anything younger than `age0`

), and the PDR and `rholambda0`

are fitted to this new (shorter) tree, with time shifted appropriately. The fitted `rholambda0`

is thus the product of the sampling fraction at `age0`

and the speciation rate at `age0`

. Note that the sampling fraction at `age0`

is simply the fraction of lineages extant at `age0`

that are represented in the timetree.

It is generally advised to provide as much information to the function `fit_hbd_pdr_on_grid`

as possible, including reasonable lower and upper bounds (`min_PDR`

, `max_PDR`

, `min_rholambda0`

and `max_rholambda0`

) and a reasonable parameter guess (`guess_PDR`

and `guess_rholambda0`

). It is also important that the `age_grid`

is sufficiently fine to capture the expected major variations of the PDR over time, but keep in mind the serious risk of overfitting when `age_grid`

is too fine and/or the tree is too small.

A list with the following elements:

`success` |
Logical, indicating whether model fitting succeeded. If |

`objective_value` |
The maximized fitting objective. Currently, only maximum-likelihood estimation is implemented, and hence this will always be the maximized log-likelihood. |

`objective_name` |
The name of the objective that was maximized during fitting. Currently, only maximum-likelihood estimation is implemented, and hence this will always be “loglikelihood”. |

`loglikelihood` |
The log-likelihood of the fitted model for the given timetree. |

`fitted_PDR` |
Numeric vector of size Ngrid, listing fitted or fixed pulled diversification rates (PDR) at each age-grid point. Between grid points the fitted PDR should be interpreted as a piecewise polynomial function (natural spline) of degree |

`fitted_rholambda0` |
Numeric, specifying the fitted or fixed product |

`guess_PDR` |
Numeric vector of size Ngrid, specifying the initial guess for the PDR at each age-grid point. |

`guess_rholambda0` |
Numeric, specifying the initial guess for |

`age_grid` |
The age-grid on which the PDR is defined. This will be the same as the provided |

`NFP` |
Integer, number of fitted (i.e., non-fixed) parameters. If none of the PDRs or |

`AIC` |
The Akaike Information Criterion for the fitted model, defined as |

`BIC` |
The Bayesian information criterion for the fitted model, defined as |

`converged` |
Logical, specifying whether the maximum likelihood was reached after convergence of the optimization algorithm. Note that in some cases the maximum likelihood may have been achieved by an optimization path that did not yet converge (in which case it's advisable to increase |

`Niterations` |
Integer, specifying the number of iterations performed during the optimization path that yielded the maximum likelihood. |

`Nevaluations` |
Integer, specifying the number of likelihood evaluations performed during the optimization path that yielded the maximum likelihood. |

`bootstrap_estimates` |
If |

`standard_errors` |
If |

`medians` |
If |

`CI50lower` |
If |

`CI50upper` |
Similar to |

`CI95lower` |
Similar to |

`CI95upper` |
Similar to |

Stilianos Louca

S. Louca et al. (2018). Bacterial diversification through geological time. Nature Ecology & Evolution. 2:1458-1467.

## Not run: # Generate a random tree with exponentially varying lambda & mu Ntips = 10000 rho = 0.5 # sampling fraction time_grid = seq(from=0, to=100, by=0.01) lambdas = 2*exp(0.1*time_grid) mus = 1.5*exp(0.09*time_grid) sim = generate_random_tree( parameters = list(rarefaction=rho), max_tips = Ntips/rho, coalescent = TRUE, added_rates_times = time_grid, added_birth_rates_pc = lambdas, added_death_rates_pc = mus) tree = sim$tree root_age = castor::get_tree_span(tree)$max_distance cat(sprintf("Tree has %d tips, spans %g Myr\n",length(tree$tip.label),root_age)) # calculate true PDR lambda_slopes = diff(lambdas)/diff(time_grid); lambda_slopes = c(lambda_slopes[1],lambda_slopes) PDRs = lambdas - mus - (lambda_slopes/lambdas) # Fit PDR on grid Ngrid = 10 age_grid = seq(from=0,to=root_age,length.out=Ngrid) fit = fit_hbd_pdr_on_grid(tree, age_grid = age_grid, min_PDR = -100, max_PDR = +100, condition = "crown", Ntrials = 10, # perform 10 fitting trials Nthreads = 2, # use two CPUs max_model_runtime = 1) # limit model evaluation to 1 second if(!fit$success){ cat(sprintf("ERROR: Fitting failed: %s\n",fit$error)) }else{ cat(sprintf("Fitting succeeded:\nLoglikelihood=%g\n",fit$loglikelihood)) # plot fitted & true PDR plot( x = fit$age_grid, y = fit$fitted_PDR, main = 'Fitted & true PDR', xlab = 'age', ylab = 'PDR', type = 'b', col = 'red', xlim = c(root_age,0)) lines(x = sim$final_time-time_grid, y = PDRs, type = 'l', col = 'blue'); # get fitted PDR as a function of age PDR_fun = approxfun(x=fit$age_grid, y=fit$fitted_PDR) } ## End(Not run)

[Package *castor* version 1.6.7 Index]