R: Phylogenetic autocorrelation function of a numeric trait.

get_trait_acf {castor}

R Documentation

Phylogenetic autocorrelation function of a numeric trait.

Description

Given a rooted phylogenetic tree and a numeric (typically continuous) trait with known value (state) on each tip, calculate the phylogenetic autocorrelation function (ACF) of the trait. The ACF is a function of phylogenetic distance x, where ACF(x) is the Pearson autocorrelation of the trait between two tips, provided that the tips have phylogenetic ("patristic") distance x. The function get_trait_acf also calculates the mean absolute difference and the mean relative difference of the trait between any two random tips at phylogenetic distance x (see details below).

Usage

get_trait_acf(tree, 
              tip_states,
              Npairs            = 10000,
              Nbins             = NULL,
              min_phylodistance = 0,
              max_phylodistance = NULL,
              uniform_grid      = FALSE,
              phylodistance_grid= NULL)

Arguments

`tree`	A rooted tree of class "phylo". The root is assumed to be the unique node with no incoming edge.
`tip_states`	A numeric vector of size Ntips, specifying the value of the trait at each tip in the tree. Note that tip_states[i] (where i is an integer index) must correspond to the i-th tip in the tree.
`Npairs`	Total number of random tip pairs to draw. A greater number of tip pairs will improve the accuracy of the estimated ACF within each distance bin. Tip pairs are drawn randomly with replacement. If `Npairs<=0`, then every tip pair is included exactly once.
`Nbins`	Number of distance bins to consider within the range of phylogenetic distances encountered between tip pairs in the tree. A greater number of bins will increase the resolution of the ACF as a function of phylogenetic distance, but will decrease the number of tip pairs falling within each bin (which reduces the accuracy of the estimated ACF). If `NULL`, then `Nbins` is automatically and somewhat reasonably chosen based on the size of the input trees.
`min_phylodistance`	Numeric, minimum phylogenetic distance to conssider. Only relevant if `phylodistance_grid` is `NULL`.
`max_phylodistance`	Numeric, optional maximum phylogenetic distance to consider. If `NULL`, this is automatically set to the maximum phylodistance between any two tips.
`uniform_grid`	Logical, specifying whether the phylodistance grid should be uniform, i.e., with equally sized phylodistance bins. If `FALSE`, then the grid is chosen non-uniformly (i.e., each bin has different size) such that each bin roughly contains the same number of tip pairs. This helps equalize the estimation error across bins. Only relevant if `phylodistance_grid` is `NULL`.
`phylodistance_grid`	Numeric vector, optional explicitly specified phylodistance bins (left boundaries thereof) on which to evaluate the ACF. Must contain non-negative numbers in strictly ascending order. Hence, the first bin will range from `phylodistance_grid[1]` to `phylodistance_grid[2]`, while the last bin will range from `tail(phylodistance_grid,1)` to `max_phylodistance`. Can be used as an alternative to `Nbins`. If non-`NULL`, then `Nbins`, `min_phylodistance` and `uniform_grid` are irrelevant.

Details

The phylogenetic autocorrelation function (ACF) of a trait can give insight into the evolutionary processes shaping its distribution across clades. An ACF that decays slowly with increasing phylogenetic distance indicates a strong phylogenetic conservatism of the trait, whereas a rapidly decaying ACF indicates weak phylogenetic conservatism. Similarly, if the mean absolute difference in trait value between two random tips increases with phylogenetic distance, this indicates a phylogenetic autocorrelation of the trait (Zaneveld et al. 2014). Here, phylogenetic distance between tips refers to their patristic distance, i.e. the minimum cumulative edge length required to connect the two tips.

Since the phylogenetic distances between all possible tip pairs do not cover a continuoum (as there is only a finite number of tips), this function randomly draws tip pairs from the tree, maps them onto a finite set of equally-sized distance bins and then estimates the ACF for the centroid of each distance bin based on tip pairs in that bin. In practice, as a next step one would usually plot the estimated ACF (returned vector autocorrelations) over the centroids of the distance bins (returned vector distances).

Phylogenetic distance bins can be specified in two alternative ways: Either a set of bins (phylodistance grid) is automatically calculated based on the provided Nbins, min_phylodistance, max_phylodistance and uniform_grid, or a phylodistance grid is explicitly provided via phylodistance_grid and max_phylodistance.

The tree may include multi-furcations (i.e. nodes with more than 2 children) as well as mono-furcations (i.e. nodes with only one child). If tree$edge.length is missing, then every edge is assumed to have length 1. The input tree must be rooted at some node for technical reasons (see function root_at_node), but the choice of the root node does not influence the result.

This function assumes that each tip is assigned exactly one trait value. This might be problematic in situations where each tip covers a range of trait values, for example if tips are species and multiple individuals were sampled from each species. In that case, one might consider representing each individual as a separate tip in the tree, so that each tip has exactly one trait value.

Value

A list with the following elements:

`phylodistances`	Numeric vector of size Nbins, storing the center of each phylodistance bin in increasing order. This is equal to `0.5*(left_phylodistances+right_phylodistances)`. Typically, you will want to plot `autocorrelations` over `phylodistances`.
`left_phylodistances`	Numeric vector of size Nbins, storing the left boundary of each phylodistance bin in increasing order.
`right_phylodistances`	Numeric vector of size Nbins, storing the right boundary of each phylodistance bin in increasing order.
`autocorrelations`	Numeric vector of size Nbins, storing the estimated Pearson autocorrelation of the trait for each distance bin.
`mean_abs_differences`	Numeric vector of size Nbins, storing the mean absolute difference of the trait between tip pairs in each distance bin.
`mean_rel_differences`	Numeric vector of size Nbins, storing the mean relative difference of the trait between tip pairs in each distance bin. The relative difference between two values `X` and `Y` is 0 if `X==Y`, and equal to `\frac{\|X-Y\|}{0.5\cdot (\|X\|+\|Y\|)}` otherwise.
`Npairs_per_distance`	Integer vector of size Nbins, storing the number of random tip pairs associated with each phylodistance bin.

Author(s)

Stilianos Louca

References

J. R. Zaneveld and R. L. V. Thurber (2014). Hidden state prediction: A modification of classic ancestral state reconstruction algorithms helps unravel complex symbioses. Frontiers in Microbiology. 5:431.

Examples

# generate a random tree
tree = generate_random_tree(list(birth_rate_factor=0.1),max_tips=1000)$tree

# simulate continuous trait evolution on the tree
tip_states = simulate_ou_model(tree, 
                               stationary_mean  = 0,
                               stationary_std   = 1,
                               decay_rate       = 0.01)$tip_states

# calculate autocorrelation function
ACF = get_trait_acf(tree, tip_states, Nbins=10, uniform_grid=TRUE)

# plot ACF (autocorrelation vs phylogenetic distance)
plot(ACF$phylodistances, ACF$autocorrelations, type="l", xlab="distance", ylab="ACF")

[Package castor version 1.8.2 Index]