generate_tree_hbd_reverse {castor}R Documentation

Generate a tree from a birth-death model in reverse time.

Description

Generate an ultrametric timetree (comprising only extant lineages) in reverse time (from present back to the root) based on the homogenous birth-death (HBD; Morlon et al., 2011) model, conditional on a specific number of extant species sampled and (optionally) conditional on the crown age or stem age.

The probability distribution of such trees only depends on the congruence class of birth-death models (e.g., as specified by the pulled speciation rate) but not on the precise model within a congruence class (Louca and Pennell, 2019). Hence, in addition to allowing specification of speciation and extinction rates, this function can alternatively simulate trees simply based on some pulled speciation rate (PSR), or based on some pulled diversification rate (PDR) and the product ρλ_o (present-day sampling fraction times present-day speciation rate).

This function can be used to generate bootstrap samples after fitting an HBD model or HBD congruence class to a real timetree.

Usage

generate_tree_hbd_reverse( Ntips,
                           stem_age         = NULL,
                           crown_age        = NULL,
                           age_grid         = NULL,
                           lambda           = NULL,
                           mu               = NULL,
                           rho              = NULL,
                           PSR              = NULL,
                           PDR              = NULL,
                           rholambda0       = NULL,
                           force_max_age    = Inf,
                           splines_degree   = 1,
                           relative_dt      = 1e-3,
                           Ntrees           = 1,
                           tip_basename     = "",
                           node_basename    = NULL,
                           edge_basename    = NULL)

Arguments

Ntips

Number of tips in the tree, i.e. number of extant species sampled at present day.

stem_age

Numeric, optional stem age on which to condition the tree. If NULL or <=0, the tree is not conditioned on the stem age.

crown_age

Numeric, optional crown age (aka. root age or MRCA age) on which to condition the tree. If NULL or <=0, the tree is not conditioned on the crown age. If both stem_age and crown_age are specified, only the crown age is used; in that case for consistency crown_age must not be greater than stem_age.

age_grid

Numeric vector, listing discrete ages (time before present) on which the PSR is specified. Listed ages must be strictly increasing, and should cover at least the present day (age 0) as well as a sufficient duration into the past. If conditioning on the stem or crown age, that age must also be covered by age_grid. When not conditioning on crown nor stem age, and the generated tree ends up extending beyond the last time point in age_grid, the PSR will be extrapolated as a constant (with value equal to the last value in PSR) as necessary. age_grid also be NULL or a vector of size 1, in which case the PSR is assumed to be time-independent.

lambda

Numeric vector, of the same size as age_grid (or size 1 if age_grid==NULL), listing speciation rates (λ, in units 1/time) at the ages listed in age_grid. Speciation rates must be non-negative, and are assumed to vary as a spline between grid points (see argument splines_degree). Can also be NULL, in which case either PSR, or PDR and rholambda0, must be provided.

mu

Numeric vector, of the same size as age_grid (or size 1 if age_grid==NULL), listing extinction rates (μ, in units 1/time) at the ages listed in age_grid. Extinction rates must be non-negative, and are assumed to vary as a spline between grid points (see argument splines_degree). Can also be NULL, in which case either PSR, or PDR and rholambda0, must be provided.

rho

Numeric, sampling fraction at present day (fraction of extant species included in the tree). Can also be NULL, in which case either PSR, or PDR and rholambda0, must be provided.

PSR

Numeric vector, of the same size as age_grid (or size 1 if age_grid==NULL), listing pulled speciation rates (λ_p, in units 1/time) at the ages listed in age_grid. The PSR must be non-negative (and strictly positive almost everywhere), and is assumed to vary as a spline between grid points (see argument splines_degree). Can also be NULL, in which case either lambda and mu and rho, or PDR and rholambda0, must be provided.

PDR

Numeric vector, of the same size as age_grid (or size 1 if age_grid==NULL), listing pulled diversification rates (r_p, in units 1/time) at the ages listed in age_grid. The PDR is assumed to vary polynomially between grid points (see argument splines_degree). Can also be NULL, in which case either lambda and mu and rho, or PSR, must be provided.

rholambda0

Strictly positive numeric, specifying the product ρλ_o (present-day species sampling fraction times present-day speciation rate). Can also be NULL, in which case PSR must be provided.

force_max_age

Numeric, specifying an optional maximum allowed age for the tree's root. If the tree ends up expanding past that age, all remaining lineages are forced to coalesce at that age. This is not statistically consistent with the provided HBD model (in fact it corresponds to a modified HBD model with a spike in the PSR at that time). This argument merely provides a way to prevent excessively large trees if the PSR is close to zero at older ages and when not conditioning on the stem nor crown age, while still keeping the original statistical properties at younger ages. To disable this feature set force_max_age to Inf.

splines_degree

Integer, either 0,1,2 or 3, specifying the polynomial degree of the provided rates PSR, PDR, lambda, mu and rho between grid points in age_grid. For example, if splines_degree==1, then the provided rates are interpreted as piecewise-linear curves; if splines_degree==2 the rates are interpreted as quadratic splines; if splines_degree==3 the rates are interpreted as cubic splines. The splines_degree influences the analytical properties of the curve, e.g. splines_degree==1 guarantees a continuous curve, splines_degree==2 guarantees a continuous curve and continuous derivative, and so on. If your age_grid is fine enough, then splines_degree=1 is usually sufficient.

relative_dt

Strictly positive numeric (unitless), specifying the maximum relative time step allowed for integration over time. Smaller values increase integration accuracy but increase computation time. Typical values are 0.0001-0.001. The default is usually sufficient.

Ntrees

Integer, number of trees to generate. The computation time per tree is lower if you generate multiple trees at once.

tip_basename

Character. Prefix to be used for tip labels (e.g. "tip."). If empty (""), then tip labels will be integers "1", "2" and so on.

node_basename

Character. Prefix to be used for node labels (e.g. "node."). If NULL, no node labels will be included in the tree.

edge_basename

Character. Prefix to be used for edge labels (e.g. "edge."). Edge labels (if included) are stored in the character vector edge.label. If NULL, no edge labels will be included in the tree.

Details

This function requires that the BD model, or the BD congruence class (Louca and Pennell, 2019), is specified using one of the following sets of arguments:

Concurrently using/combining more than one the above parameterization methods is not supported.

Either the PSR, or the PDR and rholambda0, provide sufficient information to fully describe the probability distribution of the tree (Louca and Pennell, 2019). For example, the probability distribution of generated trees only depends on the PSR, and not on the specific speciation rate λ or extinction rate μ (various combinations of λ and μ can yield the same PSR; Louca and Pennell, 2019). To calculate the PSR and PDR for any arbitrary λ, μ and ρ you can use the function simulate_deterministic_hbd.

When not conditioning on the crown age, the age of the root of the generated tree will be stochastic (i.e., non-fixed). This function then assumes a uniform prior distribution (in a sufficiently large time interval) for the origin of the forward HBD process that would have generated the tree, based on a generalization of the EBDP algorithm provided by (Stadler, 2011). When conditioning on stem or crown age, this function is based on the algorithm proposed by Hoehna (2013, Eq. 8).

Note that HBD trees can also be generated using the function generate_random_tree. That function, however, generates trees in forward time, and hence when conditioning on the final number of tips the total duration of the simulation is unpredictable; consequently, speciation and extinction rates cannot be specified as functions of "age" (time before present). The function presented here provides a means to generate trees with a fixed number of tips, while specifying λ, μ, λ_p or r_p as functions of age (time before present).

Value

A named list with the following elements:

success

Logical, indicating whether the simulation was successful. If FALSE, then the returned list includes an additional 'error' element (character) providing a description of the error; all other return variables may be undefined.

trees

A list of length Ntrees, listing the generated trees. Each tree will be an ultrametric timetree of class "phylo".

Author(s)

Stilianos Louca

References

H. Morlon, T. L. Parsons, J. B. Plotkin (2011). Reconciling molecular phylogenies with the fossil record. Proceedings of the National Academy of Sciences. 108:16327-16332.

T. Stadler (2011). Simulating trees with a fixed number of extant species. Systematic Biology. 60:676-684.

S. Hoehna (2013). Fast simulation of reconstructed phylogenies under global time-dependent birth-death processes. Bioinformatics. 29:1367-1374.

S. Louca and M. W. Pennell (in review as of 2019). Phylogenies of extant species are consistent with an infinite array of diversification histories.

See Also

loglikelihood_hbd, simulate_deterministic_hbd, generate_random_tree

Examples

# EXAMPLE 1: Generate trees based on some speciation and extinction rate
# In this example we assume an exponentially decreasing speciation rate
#   and a temporary mass extinction event

# define parameters
age_grid = seq(0,100,length.out=1000)
lambda   = 0.1 + exp(-0.5*age_grid)
mu       = 0.05 + exp(-(age_grid-5)^2)
rho      = 0.5 # species sampling fraction at present-day

# generate a tree with 100 tips and no specific crown or stem age
sim = generate_tree_hbd_reverse(Ntips       = 100, 
                                age_grid    = age_grid, 
                                lambda      = lambda,
                                mu          = mu,
                                rho         = rho)
if(!sim$success){
    cat(sprintf("Tree generation failed: %s\n",sim$error))
}else{
    cat(sprintf("Tree generation succeeded\n"))
    tree = sim$trees[[1]]
}


########################
# EXAMPLE 2: Generate trees based on the pulled speciation rate
# Here we condition the tree on some fixed crown (MRCA) age

# specify the PSR on a sufficiently fine and wide age grid
age_grid  = seq(0,1000,length.out=10000)
PSR       = 0.1+exp(-0.1*age_grid) # exponentially decreasing PSR

# generate a tree with 100 tips and MRCA age 10
sim = generate_tree_hbd_reverse(Ntips       = 100, 
                                age_grid    = age_grid, 
                                PSR         = PSR, 
                                crown_age   = 10)
if(!sim$success){
    cat(sprintf("Tree generation failed: %s\n",sim$error))
}else{
    cat(sprintf("Tree generation succeeded\n"))
    tree = sim$trees[[1]]
}

[Package castor version 1.6.8 Index]