fit_hbd_pdr_parametric {castor} | R Documentation |
Fit parameterized pulled diversification rates of birth-death models.
Description
Given an ultrametric timetree, estimate the pulled diversification rate (PDR) of homogenous birth-death (HBD) models that best explains the tree via maximum likelihood, assuming that the PDR is given as a parameterized function of time before present. Every HBD model is defined by some speciation and extinction rates (\lambda
and \mu
) over time, as well as the sampling fraction \rho
(fraction of extant species sampled). “Homogenous” refers to the assumption that, at any given moment in time, all lineages exhibit the same speciation/extinction rates. For any given HBD model there exists an infinite number of alternative HBD models that generate extant trees with the same probability distributions and yield the same likelihood for any given extant timetree; these “congruent” models cannot be distinguished from one another solely based on an extant timetree.
Each congruence class is uniquely described by its PDR, defined as PDR=\lambda-\mu+\lambda^{-1}d\lambda/d\tau
(where \tau
is time before present) as well as the product \rho\lambda_o
(where \lambda_o
is the present-day speciation rate). That is, two HBD models are congruent if and only if they have the same PDR and the same product \rho\lambda_o
. This function is designed to estimate the generating congruence class for the tree, by fitting a finite number of parameters defining the PDR and \rho\lambda_o
.
Usage
fit_hbd_pdr_parametric( tree,
param_values,
param_guess = NULL,
param_min = -Inf,
param_max = +Inf,
param_scale = NULL,
oldest_age = NULL,
age0 = 0,
PDR,
rholambda0,
age_grid = NULL,
condition = "auto",
relative_dt = 1e-3,
Ntrials = 1,
max_start_attempts = 1,
Nthreads = 1,
max_model_runtime = NULL,
fit_control = list())
Arguments
tree |
A rooted ultrametric timetree of class "phylo", representing the time-calibrated phylogeny of a set of extant sampled species. |
param_values |
Numeric vector, specifying fixed values for a some or all model parameters. For fitted (i.e., non-fixed) parameters, use |
param_guess |
Numeric vector of size NP, specifying a first guess for the value of each model parameter. For fixed parameters, guess values are ignored. Can be |
param_min |
Optional numeric vector of size NP, specifying lower bounds for model parameters. If of size 1, the same lower bound is applied to all parameters. Use |
param_max |
Optional numeric vector of size NP, specifying upper bounds for model parameters. If of size 1, the same upper bound is applied to all parameters. Use |
param_scale |
Optional numeric vector of size NP, specifying typical scales for model parameters. If of size 1, the same scale is assumed for all parameters. If |
oldest_age |
Strictly positive numeric, specifying the oldest time before present (“age”) to consider when calculating the likelihood. If this is equal to or greater than the root age, then |
age0 |
Non-negative numeric, specifying the youngest age (time before present) to consider for fitting, and with respect to which |
PDR |
Function specifying the pulled diversification rate at any given age (time before present) and for any given parameter values. This function must take exactly two arguments, the 1st one being a numeric vector (one or more ages) and the 2nd one being a numeric vector of size NP (parameter values), and return a numeric vector of the same size as the 1st argument. Can also be a single number (i.e., PDR is fixed). |
rholambda0 |
Function specifying the product |
age_grid |
Numeric vector, specifying ages at which the |
condition |
Character, either "crown", "stem", "auto", "stemN" or "crownN" (where N is an integer >=2), specifying on what to condition the likelihood. If "crown", the likelihood is conditioned on the survival of the two daughter lineages branching off at the root at that time. If "stem", the likelihood is conditioned on the survival of the stem lineage, with the process having started at |
relative_dt |
Strictly positive numeric (unitless), specifying the maximum relative time step allowed for integration over time, when calculating the likelihood. Smaller values increase integration accuracy but increase computation time. Typical values are 0.0001-0.001. The default is usually sufficient. |
Ntrials |
Integer, specifying the number of independent fitting trials to perform, each starting from a random choice of model parameters. Increasing |
max_start_attempts |
Integer, specifying the number of times to attempt finding a valid start point (per trial) before giving up on that trial. Randomly choosen extreme start parameters may occasionally result in Inf/undefined likelihoods, so this option allows the algorithm to keep looking for valid starting points. |
Nthreads |
Integer, specifying the number of parallel threads to use for performing multiple fitting trials simultaneously. This should generally not exceed the number of available CPUs on your machine. Parallel computing is not available on the Windows platform. |
max_model_runtime |
Optional numeric, specifying the maximum number of seconds to allow for each evaluation of the likelihood function. Use this to abort fitting trials leading to parameter regions where the likelihood takes a long time to evaluate (these are often unlikely parameter regions). |
fit_control |
Named list containing options for the |
Details
This function is designed to estimate a finite set of scalar parameters (p_1,..,p_n\in\R
) that determine the PDR and the product \rho\lambda_o
(sampling fraction times present-dat extinction rate), by maximizing the likelihood of observing a given timetree under the HBD model. For example, the investigator may assume that the PDR varies exponentially over time, i.e. can be described by PDR(t)=A\cdot e^{-B t}
(where A
and B
are unknown coefficients and t
is time before present), and that the product \rho\lambda_o
is unknown. In this case the model has 3 free parameters, p_1=A
, p_2=B
and p_3=\rho\lambda_o
, each of which may be fitted to the tree.
If age0>0
, the input tree is essentially trimmed at age0
(omitting anything younger than age0
), and the PDR and rholambda0
are fitted to this new (shorter) tree, with time shifted appropriately. The fitted rholambda0
is thus the product of the sampling fraction at age0
and the speciation rate at age0
. Note that the sampling fraction at age0
is simply the fraction of lineages extant at age0
that are represented in the timetree. Most users will typically want to leave age0=0
.
It is generally advised to provide as much information to the function fit_hbd_pdr_parametric
as possible, including reasonable lower and upper bounds (param_min
and param_max
), a reasonable parameter guess (param_guess
) and reasonable parameter scales param_scale
. If some model parameters can vary over multiple orders of magnitude, it is advised to transform them so that they vary across fewer orders of magnitude (e.g., via log-transformation). It is also important that the age_grid
is sufficiently fine to capture the variation of the PDR over time, since the likelihood is calculated under the assumption that both vary linearly between grid points.
Value
A list with the following elements:
success |
Logical, indicating whether model fitting succeeded. If |
objective_value |
The maximized fitting objective. Currently, only maximum-likelihood estimation is implemented, and hence this will always be the maximized log-likelihood. |
objective_name |
The name of the objective that was maximized during fitting. Currently, only maximum-likelihood estimation is implemented, and hence this will always be “loglikelihood”. |
param_fitted |
Numeric vector of size NP (number of model parameters), listing all fitted or fixed model parameters in their standard order (see details above). If |
param_guess |
Numeric vector of size NP, listing guessed or fixed values for all model parameters in their standard order. |
loglikelihood |
The log-likelihood of the fitted model for the given timetree. |
NFP |
Integer, number of fitted (i.e., non-fixed) model parameters. |
AIC |
The Akaike Information Criterion for the fitted model, defined as |
BIC |
The Bayesian information criterion for the fitted model, defined as |
converged |
Logical, specifying whether the maximum likelihood was reached after convergence of the optimization algorithm. Note that in some cases the maximum likelihood may have been achieved by an optimization path that did not yet converge (in which case it's advisable to increase |
Niterations |
Integer, specifying the number of iterations performed during the optimization path that yielded the maximum likelihood. |
Nevaluations |
Integer, specifying the number of likelihood evaluations performed during the optimization path that yielded the maximum likelihood. |
trial_start_objectives |
Numeric vector of size |
trial_objective_values |
Numeric vector of size |
trial_Nstart_attempts |
Integer vector of size |
trial_Niterations |
Integer vector of size |
trial_Nevaluations |
Integer vector of size |
Author(s)
Stilianos Louca
References
H. Morlon, T. L. Parsons, J. B. Plotkin (2011). Reconciling molecular phylogenies with the fossil record. Proceedings of the National Academy of Sciences. 108:16327-16332.
S. Louca et al. (2018). Bacterial diversification through geological time. Nature Ecology & Evolution. 2:1458-1467.
S. Louca and M. W. Pennell (2020). Extant timetrees are consistent with a myriad of diversification histories. Nature. 580:502-505.
See Also
Examples
## Not run:
# Generate a random tree with exponentially varying lambda & mu
Ntips = 10000
rho = 0.5 # sampling fraction
time_grid = seq(from=0, to=100, by=0.01)
lambdas = 2*exp(0.1*time_grid)
mus = 1.5*exp(0.09*time_grid)
tree = generate_random_tree( parameters = list(rarefaction=rho),
max_tips = Ntips/rho,
coalescent = TRUE,
added_rates_times = time_grid,
added_birth_rates_pc = lambdas,
added_death_rates_pc = mus)$tree
root_age = castor::get_tree_span(tree)$max_distance
cat(sprintf("Tree has %d tips, spans %g Myr\n",length(tree$tip.label),root_age))
# Define a parametric HBD congruence class, with exponentially varying PDR
# The model thus has 3 parameters
PDR_function = function(ages,params){
return(params['A']*exp(-params['B']*ages));
}
rholambda0_function = function(params){
return(params['rholambda0'])
}
# Define an age grid on which PDR_function shall be evaluated
# Should be sufficiently fine to capture the variation in the PDR
age_grid = seq(from=0,to=100,by=0.01)
# Perform fitting
cat(sprintf("Fitting class to tree..\n"))
fit = fit_hbd_pdr_parametric( tree,
param_values = c(A=NA, B=NA, rholambda0=NA),
param_guess = c(1,0,1),
param_min = c(-10,-10,0),
param_max = c(10,10,10),
param_scale = 1, # all params are in the order of 1
PDR = PDR_function,
rholambda0 = rholambda0_function,
age_grid = age_grid,
Ntrials = 10, # perform 10 fitting trials
Nthreads = 2, # use 2 CPUs
max_model_runtime = 1, # limit model evaluation to 1 second
fit_control = list(rel.tol=1e-6))
if(!fit$success){
cat(sprintf("ERROR: Fitting failed: %s\n",fit$error))
}else{
cat(sprintf("Fitting succeeded:\nLoglikelihood=%g\n",fit$loglikelihood))
print(fit)
}
## End(Not run)