reconstruct_past_diversification {castor} R Documentation

## Reconstruct past diversification dynamics from a diversity time series.

### Description

Given a time series of past diversities (coalescent or not), this function estimates instantaneous birth (speciation) and death (extinction) rates that would lead to the observed diversity time series. The function is based on a deterministic model (or the continuum limit of a stochastic cladogenic model), in which instantaneous birth and death rates lead to a predictable growth of a tree (one new species per birth event). The reconstruction is non-parametric, i.e. does not rely on fitting a parameterized model. The reconstruction is only accurate in the deterministic limit, i.e. for high diversities where the stochastic nature of the cladogenic process diminishes. Of particular importance is the case where the time series is coalescent, i.e. represents the diversity (lineages-through-time) that would be represented in a coalescent tree with extinctions.

Note: This function is included for legacy reasons mainly. In most cases users should instead use the functions fit_hbd_model_on_grid and fit_hbd_model_parametric to fit birth-death models, or the functions fit_hbd_pdr_on_grid, fit_hbd_pdr_parametric and fit_hbd_psr_on_grid to fit BD model congruence classes (aka. “pulled variables”) to a tree.

### Usage

reconstruct_past_diversification( times,
diversities,
birth_rates_pc            = NULL,
rarefaction               = NULL,
discovery_fractions       = NULL,
discovery_fraction_slopes = NULL,
max_age                   = NULL,
coalescent                = FALSE,
smoothing_span            = 0,
smoothing_order           = 1)


### Arguments

 times Numeric vector, listing the times at which diversities are given. Values must be in ascending order. diversities Numeric vector of the same size as times, listing diversities (coalescent or not) at each time point. birth_rates_pc Numeric vector of the same size as times, listing known or assumed per-capita birth rates (speciation rates). Can also be of size 1, in which case the same per-capita birth rate is assumed throughout. Alternatively if coalescent==TRUE, then this vector can also be empty, in which case a constant per-capita birth rate is assumed and estimated from the slope of the coalescent diversities at the last time point. The last alternative is not available when coalescent==FALSE. rarefaction Numeric between 0 and 1. Optional rarefaction fraction assumed for the diversities at the very end. Set to 1 to assume no rarefaction was performed. discovery_fractions Numeric array of size Ntimes, listing the fractions of extant lineages represented in the tree over time. Hence, discovery_fraction[t] is the probability that a lineage at time times[t] with extant representatives will be represented in the tree. Can be used as an alternative to rarefaction, for example if discovery of extant species is non-random or phylogenetically biased. Experimental, so leave this NULL if you don't know what it means. discovery_fraction_slopes Numeric array of size Ntimes, listing the 1st derivative of discovery_fractions (w.r.t. time) over time. If NULL, this will be estimated from discovery_fractions via basic finite differences if needed. Experimental, so leave this NULL if you don't know what it means. max_age Numeric. Optional maximum distance from the end time to be considered. If NULL or <=0 or Inf, all provided time points are considered. coalescent Logical, indicating whether the provided diversities are from a coalescent tree (only including clades with extant representatives) or total diversities (extant species at each time point). smoothing_span Non-negative integer. Optional sliding window size (number of time points) for smoothening the diversities time series via Savitzky-Golay-filter. If <=2, no smoothing is done. Smoothening the time series can reduce the effects of noise on the reconstructed diversity dynamics. smoothing_order Integer between 1 and 4. Polynomial order of the Savitzky-Golay smoothing filter to be applied. Only relevant if smoothing_span>2. A value of 1 or 2 is typically recommended.

### Details

This function can be used to fit a birth-death model to a coalescent diversity time series N_c(τ) at various ages τ, also known as “lineages-through-time” curve. The reconstruction of the total diversity N(τ) is based on the following formulas:

E(τ)=1+\frac{ν(τ)}{β(τ)},\\

N(τ)=\frac{N_c}{1-E(τ)},

ν(τ)=\frac{1}{N_c(τ)}\frac{dN_c(τ)}{dτ}

where E(τ) is the probability that a clade of size 1 at age τ went extinct by the end of the time series and β is the per-capita birth rate. If the per-capita birth rate is not explicitly provided for each time point (see argument birth_rate_pc), the function assumes that the per-capita birth rate (speciation rate) is constant at all times. If birth_rates_pc==NULL and coalescent==TRUE, the constant speciation rate is estimated as

β = -\frac{ν(0)}{ρ},

where ρ is the fraction of species kept after rarefaction (see argument rarefaction).

Assuming a constant speciation rate may or may not result in accurate estimates of past total diversities and other quantities. If a time-varying speciation rate is suspected but not known, additional information on past diversification dynamics may be obtained using modified (“pulled”) quantities that partly resemble the classical extinction rate, diversification rate and total diversity. Such quantities are the “pulled diversification rate”:

η(τ) = δ(τ) - β(τ) + \frac{1}{β(τ)}\frac{dβ}{dτ},

the “pulled extinction rate”:

δ_p(τ) = δ(τ) + (β_o-β(τ)) - \frac{1}{β(τ)}\frac{dβ}{dτ},

and the “pulled total diversity”:

N_p(τ) = N(τ)\cdot\frac{β_o}{β(τ)},

where β_o is the provided or estimated (if not provided) speciation rate at the last time point. The advantage of these quantities is that they can be estimated from the coalescent diversities (lineages-through-time) without any assumptions on how β and δ varied over time. The disadvantage is that they differ from their “non-pulled” quantities (β-δ, δ and N), in cases where β varied over time.

### Value

A named list with the following elements:

 success Logical, specifying whether the reconstruction was successful. If FALSE, the remaining elements may not be defined. Ntimes Integer. Number of time points for which reconstruction is returned. total_diversities Numeric vector of the same size as times, listing the total diversity at each time point (number of extant lineages at each time point). If coalescent==FALSE, then these are the same as the diversities passed to the function. coalescent_diversities Numeric vector of the same size as times, listing the coalescent diversities at each time point (number of species with at least one extant descendant at the last time point). If coalescent==TRUE, then these are the same as the diversities passed to the function. birth_rates Numeric vector of the same size as times, listing the estimated birth rates (speciation events per time unit). death_rates Numeric vector of the same size as times, listing the estimated death rates (extinction events per time unit). Psurvival Numeric vector of the same size as times, listing the estimated fraction of lineages at each time point that eventually survive. Psurvival[i] is the probability that a clade of size 1 at time times[i] will be extant by the end of the time series. May be NULL in some cases. Pdiscovery Numeric vector of the same size as times, listing the estimated fraction of lineages at each time point that are eventually discovered, provided that they survive. Pdiscovery[i] is the probability that a clade of size 1 at time times[i] that is extant by the end of the time series, will be discovered. May be NULL in some cases. Prepresentation Numeric vector of the same size as times, listing the estimated fraction of lineages at each time point that eventually survive and are discovered. Prepresentation[i] is the probability that a clade of size 1 at time times[i] will be extant by the end of the time series and visible in the coalescent tree after rarefaction. Note that Prepresentation = Psurvival * Pdiscovery. May be NULL in some cases. total_births Numeric, giving the estimated total number of birth events that occurred between times T-max_age and T, where T is the last time point of the time series. total_deaths Numeric, giving the estimated total number of death events that occurred between times T-max_age and T, where T is the last time point of the time series. last_birth_rate_pc The provided or estimated (if not provided) speciation rate at the last time point. This corresponds to the birth rate divided by the estimated true diversity (prior to rarefaction) at the last time point. last_death_rate_pc The estimated extinction rate at the last time point. This corresponds to the death rate divided by the estimated true diversity (prior to rarefaction) at the last time point. pulled_diversification_rates Numeric vector of the same size as times, listing the estimated pulled diversification rates. pulled_extinction_rates Numeric vector of the same size as times, listing the estimated pulled extinction rates. pulled_total_diversities Numeric vector of the same size as times, listing the estimated pulled total diversities.

Stilianos Louca

### References

Louca et al (2018). Bacterial diversification through geological time. Nature Ecology & Evolution. 2:1458-1467.

generate_random_tree, fit_tree_model, count_lineages_through_time, fit_hbd_model_parametric, fit_hbd_model_on_grid

### Examples

#####################################################
# EXAMPLE 1

# Generate a coalescent tree
params = list(birth_rate_intercept  = 0,
birth_rate_factor     = 1,
birth_rate_exponent   = 1,
death_rate_intercept  = 0,
death_rate_factor     = 0.05,
death_rate_exponent   = 1.3,
rarefaction           = 1)
simulation = generate_random_tree(params,max_time_eq=1,coalescent=TRUE)
tree = simulation$tree time_span = simulation$final_time - simulation$root_time cat(sprintf("Generated tree has %d tips, spans %g time units\n",length(tree$tip.label),time_span))

# Calculate diversity time series from the tree
counter = count_lineages_through_time(tree, times=seq(0,0.99*time_span,length.out=100))

# print coalescent diversities
print(counter$lineages) # reconstruct diversification dynamics based on diversity time series results = reconstruct_past_diversification( counter$times,
counter$lineages, coalescent = TRUE, smoothing_span = 3, smoothing_order = 1) # print reconstructed total diversities print(results$total_diversities)

# plot coalescent and reconstructed true diversities
matplot(x     = counter$times, y = matrix(c(counter$lineages,results$total_diversities), ncol=2, byrow=FALSE), type = "b", xlab = "time", ylab = "# clades", lty = c(1,2), pch = c(1,0), col = c("red","blue")) legend( "topleft", legend = c("coalescent (simulated)","true (reconstructed)"), col = c("red","blue"), lty = c(1,2), pch = c(1,0)); ##################################################### # EXAMPLE 2 # Generate a non-coalescent tree params = list(birth_rate_intercept = 0, birth_rate_factor = 1, birth_rate_exponent = 1, death_rate_intercept = 0, death_rate_factor = 0.05, death_rate_exponent = 1.3, rarefaction = 1) simulation = generate_random_tree(params,max_time_eq=1,coalescent=FALSE) tree = simulation$tree
time_span = simulation$final_time - simulation$root_time
cat(sprintf("Generated tree has %d tips, spans %g time units\n",length(tree$tip.label),time_span)) # Calculate diversity time series from the tree counter = count_lineages_through_time(tree, times=seq(0,0.99*time_span,length.out=100)) # print true diversities print(counter$lineages)

# reconstruct diversification dynamics based on diversity time series
results = reconstruct_past_diversification( counter$times, counter$lineages,
birth_rates_pc  = params$birth_rate_factor, coalescent = FALSE, smoothing_span = 3, smoothing_order = 1) # print coalescent diversities print(results$coalescent_diversities)

# plot coalescent and reconstructed true diversities
matplot(x     = counter$times, y = matrix(c(results$coalescent_diversities,counter\$lineages), ncol=2, byrow=FALSE),
type  = "b",
xlab  = "time",