variogram {ctpm}R Documentation

Calculate an empirical variogram from phylogenetic data.

Description

This function calculates the empirical variogram of phylogenetic data for visualizing stationary (time-averaged) autocorrelation structure. One of two weighting algorithms can be used.

Usage

variogram(data, phylo, weights = "IID", complete = FALSE, time.units = "Ma", 
          trait.units = NULL,  progress = TRUE, algorithm = "GMM")

Arguments

data

A vector of continuous species trait data. This vector needs to be of the same length and same order as phylo$tip.label.

phylo

An object of class 'phylo'.

weights

The weights to apply when calculating the semi-variances. Can be one of "IID", or "BM". Defaults to "IID".

time.units

A character string defining the units of the branch lengths. Defaults to "Ma".

trait.units

A character string defining the units of the trait being analysed. Defaults to uniteless.

complete

A logical value indicating whether the semi-variance is to be calculated across all possible lags. Defalts to FALSE.

algorithm

A character string defining the algorithm to apply when calculating the time-lag bins. Can be one of "kmeans", or "GMM".

progress

A logical value indicating whether to include a progress bar.

Details

weights

The weights for the semi-variance are calculated based on an assumption about the form of the the correlation matrix. If the phylogenetic process is Independent and Identically Distributed (IID), then it is sufficient to consider a correlation matrix, where the diagonal is 1 and the off-diagonal is 1/4 if species pair (i,j) and (k,l) share one species in common and 0 otherwise.

If the phylogenetic process is BM, then it is sufficient to consider a correlation matrix where the diagonal is 1 and the off-diagonal is the squared proportion of time lag \tau during which the backward-in-time-forward-in-time tip-branch-tip trajectories correspond to the same species.

complete

If calculating all pairwaise phylogenetic distances results in highly irregular time series (which is usually the case), it is more useful to coarsen the variogram. This is the case when complete = FALSE. Species are binned across lags, with the number of lags estimated using either kmeans or Gaussian Mixture Modelling GMM clustering with n classes = \sqrt(N).

Value

Returns a variogram object (class variogram) which is a dataframe containing the time-lag, lag, the semi-variance estimate at that lag, SVF, and the degrees of freedom on the estimated semi-variance DOF.

Note

Can be slow on very large phylogenies.

Author(s)

M. J. Noonan, C. H. Fleming.

References

Noonan, M. J., Fagan, W. F., and Fleming C. H. (2021) “A semi-variance approach to visualising phylogenetic autocorrelation”. Methods in Ecology and Evolution, in press.

See Also

vignette("variogram", package = "ctpm"), plot.variogram, %#%, KMeans_rcpp, GMM.

Examples

#Load package and data
library(ctpm)
data("moid_traits")
data("musteloids")

#Extract the trait of interest from the full dataset
SSD <- moid_traits$SSD

#Calculate variogram
SVF <- variogram(SSD, musteloids)

#Plot the variogram
plot(SVF)

[Package ctpm version 1.0.1 Index]