simulate_deterministic_hbds {castor} R Documentation

## Simulate a deterministic homogenous birth-death-sampling model.

### Description

Given a homogenous birth-death-sampling (HBDS) model, i.e., with speciation rate λ, extinction rate μ, continuous (Poissonian) sampling rate ψ and retention probability κ, calculate various deterministic features of the model backwards in time, such as the total population size and the LTT over time. Continuously sampled lineages are kept in the pool of extant lineages at probability κ. The variables λ, μ, ψ and κ may depend on time. In addition, the model can include concentrated sampling attempts at a finite set of discrete time points t_1,..,t_m. “Homogenous” refers to the assumption that, at any given moment in time, all lineages exhibit the same speciation/extinction/sampling rates and retention proabbility. Every HBDS model is thus defined based on the values that λ, μ, ψ and κ take over time, as well as the sampling probabilities ψ_1,..,ψ_m and retention probabilities κ_1,..,κ_m during the concentrated sampling attempts. Special cases of this model are sometimes known as “birth-death-skyline plots” in the literature (Stadler 2013). In epidemiology, these models are often used to describe the phylogenies of viral strains sampled over the course of the epidemic. A “concentrated sampling attempt” is a brief but intensified sampling period that lasted much less than the typical timescales of speciation/extinction. “Deterministic” refers to the fact that all calculated properties are completely determined by the model's parameters (i.e. non-random), as if an infinitely large tree was generated (aka. “continuum limit”). The time-profiles of λ, μ, ψ and κ are specified as piecewise polynomial curves (splines), defined on a discrete grid of ages.

### Usage

simulate_deterministic_hbds(age_grid                        = NULL,
lambda                          = NULL,
mu                              = NULL,
psi                             = NULL,
kappa                           = NULL,
splines_degree                  = 1,
CSA_ages                        = NULL,
CSA_probs                       = NULL,
CSA_kappas                      = NULL,
requested_ages                  = NULL,
age0                            = 0,
N0                              = NULL,
LTT0                            = NULL,
ODE_relative_dt                 = 0.001,
ODE_relative_dy                 = 1e-4)


### Arguments

 age_grid Numeric vector, listing discrete ages (time before present) on which either λ and μ, or the PDR and μ, are specified. Listed ages must be strictly increasing, and must cover at least the full considered age interval (from age0 to oldest_age). Can also be NULL or a vector of size 1, in which case the speciation rate, extinction rate and PDR are assumed to be time-independent. lambda Numeric vector, of the same size as age_grid (or size 1 if age_grid==NULL), listing speciation rates (λ, in units 1/time) at the ages listed in age_grid. Speciation rates should be non-negative, and are assumed to vary polynomially between grid points (see argument splines_degree). mu Numeric vector, of the same size as age_grid (or size 1 if age_grid==NULL), listing extinction rates (μ, in units 1/time) at the ages listed in age_grid. Extinction rates should be non-negative, and are assumed to vary polynomially between grid points (see argument splines_degree). psi Numeric vector, of the same size as age_grid (or size 1 if age_grid==NULL), listing the continuous (Poissonian) sampling rate at the ages listed in age_grid. Sampling rates should be non-negative, and are assumed to vary polynomially between grid points (see argument splines_degree). kappa Numeric vector, of the same size as age_grid (or size 1 if age_grid==NULL), listing the retention probabilities following Poissonian sampling events, at the ages listed in age_grid. The listed values must be true probabilities, i.e. between 0 and 1, and are assumed to vary polynomially between grid points (see argument splines_degree). The retention probability is the probability that a continuously sampled lineage remains in the pool of extant lineages. Note that many epidemiological models assume kappa to be zero. splines_degree Integer, either 0,1,2 or 3, specifying the polynomial degree of the provided lambda, mu, psi and kappa between grid points in age_grid. For example, if splines_degree==1, then the provided lambda, mu, psi and kappa are interpreted as piecewise-linear curves; if splines_degree==2 they are interpreted as quadratic splines; if splines_degree==3 they are interpreted as cubic splines. The splines_degree influences the analytical properties of the curve, e.g. splines_degree==1 guarantees a continuous curve, splines_degree==2 guarantees a continuous curve and continuous derivative, and so on. CSA_ages Optional numeric vector, listing the ages of concentrated sampling attempts, in ascending order. Concentrated sampling is performed in addition to any continuous (Poissonian) sampling specified by psi. CSA_probs Optional numeric vector of the same size as CSA_ages, listing sampling probabilities at each concentrated sampling attempt. Note that in contrast to the sampling rates psi, the CSA_probs are interpreted as probabilities and must thus be between 0 and 1. CSA_probs must be provided if and only if CSA_ages is provided. CSA_kappas Optional numeric vector of the same size as CSA_ages, listing retention probabilities at each concentrated sampling event, i.e. the probability at which a sampled lineage is kept in the pool of extant lineages. Note that the CSA_kappas are probabilities and must thus be between 0 and 1. CSA_kappas must be provided if and only if CSA_ages is provided. requested_ages Optional numeric vector, listing ages (in ascending order) at which the various model variables are requested. If NULL, it will be set to age_grid. age0 Non-negative numeric, specifying the age at which LTT0 and pop_size0 are specified. Typically this will be 0, i.e., corresponding to the present. N0 Positive numeric, specifying the number of extant species (sampled or not) at age0. Used to determine the "scaling factor" for the returned population sizes and LTT. Either pop_size0 or LTT0 must be provided, but not both. LTT0 Positive numeric, specifying the number of lineages present in the tree at age0. Used to determine the "scaling factor" for the returned population sizes and LTT. Either pop_size0 or LTT0 must be provided, but not both. ODE_relative_dt Positive unitless number, specifying the default relative time step for internally used ordinary differential equation solvers. Typical values are 0.01-0.001. ODE_relative_dy Positive unitless number, specifying the relative difference between subsequent simulated and interpolated values, in internally used ODE solvers. Typical values are 1e-2 to 1e-5. A smaller ODE_relative_dy increases interpolation accuracy, but also increases memory requirements and adds runtime (scaling with the tree's age span, not with Ntips).

### Details

The simulated LTT refers to a hypothetical tree sampled at age 0, i.e. LTT(t) will be the number of lineages extant at age t that survived and were sampled until by the present day. Note that if a concentrated sampling attempt occurs at age τ, then LTT(τ) is the number of lineages in the tree right before the occurrence of the sampling attempt, i.e., in the limit where τ is approached from above.

Note that in this function age always refers to time before present, i.e., present day age is 0, and age increases towards the root.

### Value

A named list with the following elements:

 success Logical, indicating whether the calculation was successful. If FALSE, then the returned list includes an additional 'error' element (character) providing a description of the error; all other return variables may be undefined. ages Numerical vector of size NG, listing discrete ages (time before present) on which all returned time-curves are specified. Will be equal to requested_ages, if the latter was provided. total_diversity Numerical vector of size NG, listing the predicted (deterministic) total diversity (number of extant species, denoted N) at the ages given in ages[]. LTT Numeric vector of size NG, listing the number of lineages represented in the tree at any given age, also known as “lineages-through-time” (LTT) curve. Note that LTT at age0 will be equal to LTT0 (if the latter was provided). nLTT Numeric vector of size NG, listing values of the normalized LTT at ages ages[]. The nLTT is calculated by dividing the LTT by its area-under-the-curve (AUC). The AUC is calculated by integrating the LTT over the time spanned by ages and using the trapezoid rule. Hence, the exact value of the AUC and of the nLTT depends on the span and resolution of ages[]. If you want the AUC to accurately refer to the entire area under the curve (i.e. across the full real axis), you should specify a sufficiently wide and sufficiently fine age grid (e.g., via requested_ages). Pmissing Numeric vector of size NG, listing the probability that a lineage, extant at a given age, will not be represented in the tree. lambda Numeric vector of size NG, listing the speciation rates at the ages given in ages[]. mu Numeric vector of size NG, listing the extinctions rates at the ages given in ages[]. psi Numeric vector of size NG, listing the Poissonian sampling rates at the ages given in ages[]. kappa Numeric vector of size NG, listing the retention probabilities (for continuously sampled lineages) at the ages given in ages[]. PDR Numeric vector of size NG, listing the pulled diversification rate (PDR, in units 1/time) at the ages given in ages[]. IPRP Numeric vector of size NG, listing the age-integrated pulled diversification rate at the ages given in ages[], i.e. IPDR(t)=\int_0^t PDR(s)ds. PSR Numeric vector of size NG, listing the “pulled speciation rate” (PSR, in units 1/time) at the ages given in ages[]. The PSR is defined as PSR=λ\cdot(1-Pmissing). PRP Numeric vector of size NG, listing the “pulled retention probability” (PRP) at the ages given in ages[]. The PRP is defined as PRP=κ\cdot(1-Pmissing). diversification_rate Numeric vector of size NG, listing the net diversification rate (in units 1/time) at the ages given in ages[]. branching_density Numeric vector of size NG, listing the deterministic branching density (PSR * nLTT, in units nodes/time) at the ages given in ages[]. sampling_density Numeric vector of size NG, listing the deterministic sampling density (ψ\cdot N/AUC, in units tips/time, where AUC is the area-under-the-curve calculated for the LTT) at the ages given in ages[]. lambda_psi Numeric vector of size NG, listing the product of the speciation rate and Poissonian sampling rate (in units 1/time^2) at the ages given in ages[]. kappa_psi Numeric vector of size NG, listing the product of the continuous sampling rate and the continuous retention probability (in units 1/time) at the ages given in ages[]. Reff Numeric vector of size NG, listing the effective reproduction ratio (R_e=λ/(μ+ψ(1-κ))) at the ages given in ages[]. removal_rate Numeric vector of size NG, listing the total removal rate (μ+ψ), also known as “become uninfectious rate”, at the ages given in ages[]. sampling_proportion Numeric vector of size NG, listing the instantaneous sampling proportion (ψ/(μ+ψ)) at the ages given in ages[]. CSA_pulled_probs Numeric vector of size NG, listing the pulled concentrated sampling probabilities, \tilde{ρ}_k=ρ_k/(1-E). CSA_psis Numeric vector of size NG, listing the continuous (Poissonian) sampling rates during the concentrated sampling attempts, ψ(t_1),..,ψ(t_m). CSA_PSRs Numeric vector of size NG, listing the pulled speciation rates during the concentrated sampling attempts.

Stilianos Louca

### References

T. Stadler, D. Kuehnert, S. Bonhoeffer, A. J. Drummond (2013). Birth-death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV). PNAS. 110:228-233.

generate_tree_hbds, fit_hbds_model_parametric, simulate_deterministic_hbd

### Examples

# define an HBDS model with exponentially decreasing speciation/extinction rates
# and constant Poissonian sampling rate psi
oldest_age= 10
age_grid  = seq(from=0,to=oldest_age,by=0.1) # choose a sufficiently fine age grid
lambda    = 1*exp(0.01*age_grid) # define lambda on the age grid
mu        = 0.2*lambda # assume similarly shaped but smaller mu

# simulate deterministic HBD model
# scale LTT such that it is 100 at age 1
simulation = simulate_deterministic_hbds(age_grid   = age_grid,
lambda     = lambda,
mu         = mu,
psi        = 0.1,
age0       = 1,
LTT0       = 100)
# plot deterministic LTT
plot( x = simulation$ages, y = simulation$LTT, type='l',
main='dLTT', xlab='age', ylab='lineages', xlim=c(oldest_age,0))


[Package castor version 1.6.8 Index]