unifrac {abdiv} | R Documentation |
UniFrac distance
Description
The UniFrac distance is a phylogenetically-weighted distance between two communities of organisms. The measure has been extended a number of times to include abundance-weighted and variance-adjusted versions.
Usage
unweighted_unifrac(x, y, tree, xy_labels = NULL)
weighted_unifrac(x, y, tree, xy_labels = NULL)
weighted_normalized_unifrac(x, y, tree, xy_labels = NULL)
variance_adjusted_unifrac(x, y, tree, xy_labels = NULL)
generalized_unifrac(x, y, tree, alpha = 0.5, xy_labels = NULL)
information_unifrac(x, y, tree, xy_labels = NULL)
phylosor(x, y, tree, xy_labels = NULL)
Arguments
x , y |
Numeric vectors of species counts or proportions. |
tree |
A phylogenetic tree object. |
xy_labels |
A character vector of species labels for |
alpha |
Generalized UniFrac parameter. |
Details
These functions compute different variations of the UniFrac distance between
communities described by the vectors x
and y
. If the vectors
are named, the names will be automatically used to match the vectors with
the tree. Missing names are filled in with zero counts. If the vectors are
not named and xy_labels
is provided, these labels will be used to
match the vectors with the tree. If the vectors are not named and
xy_labels
is not provided, it is assumed that the vectors are already
in the correct order, and we simply check that their length matches the
number of tips in the tree.
unweighted_unifrac
gives the original UniFrac distance from Lozupone
and Knight (2005), which is the fraction of total branch length leading to
community x
or community y
, but not both. It is based on
species presence/absence.
weighted_unifrac
gives the abundance-weighted version of UniFrac
proposed by Lozupone et al. (2007). In this measure, the branch lengths of
the tree are multiplied by the absolute difference in species abundances
below each branch.
weighted_normalized_unifrac
provides a normalized version of
weighted_unifrac
, so the distance is between 0 and 1.
variance_adjusted_unifrac
was proposed by Chang et al. (2011) to
adjust for the variation of weights in weighted UniFrac under random
sampling.
generalized_unifrac
was proposed by Chen et al. (2012) to provide a
unified mathematical framework for weighted and unweighted UniFrac distance.
It includes a parameter, \alpha
, which can be used to adjust the
abundance-weighting in the distance. A value of \alpha = 1
corresponds
to weighted UniFrac. A value of \alpha = 0
corresponds to unweighted
UniFrac if presence/absence vectors are provided. The authors suggest a
value of \alpha = 0.5
as a compromise between weighted and unweighted
distances.
information_unifrac
was proposed by Wong et al. (2016) to connect
UniFrac distance with compositional data analysis. They also proposed a
"ratio UniFrac" distance, which is not yet implemented.
phylosor
, proposed by Bryant et al. (2008), is closely related to
unweighted UniFrac distance. If unweighted UniFrac distance is the analogue
of Jaccard distance using branches on a phylogenetic tree, PhyloSor is the
analogue of Sorenson dissimilarity.
Value
The UniFrac distance between communities x
and y
. The
distance is not defined if either x
or y
have all zero
elements. We return NaN
if this is the case.
References
Lozupone C, Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Applied and environmental microbiology. 2005;71(12):8228–8235. 10.1128/AEM.71.12.8228-8235.2005
Lozupone CA, Hamady M, Kelley ST, Knight R. Quantitative and
qualitative \beta
diversity measures lead to different insights into
factors that structure microbial communities. Applied and environmental
microbiology. 2007;73(5):1576–1585. 10.1128/AEM.01996-06
Chang Q., et al. Variance adjusted weighted UniFrac: a powerful beta diversity measure for comparing communities based on phylogeny. BMC Bioinformatics. 2011;12:118. 10.1186/1471-2105-12-118
Chen J, Bittinger K, Charlson ES, Hoffmann C, Lewis J, Wu GD, et al. Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics. 2012;28(16):2106–2113. 10.1093/bioinformatics/bts342
Wong RG, Wu JR, Gloor GB. Expanding the UniFrac Toolbox. PLOS ONE. 2016;11(9):1–20. 10.1371/journal.pone.0161196
Bryant JA, Lamanna C, Morlon H, Kerkhoff AJ, Enquist BJ, Green JL. Microbes on mountainsides: contrasting elevational patterns of bacterial and plant diversity. Proc Natl Acad Sci U S A. 2008;105 Suppl 1:11505-11. 10.1073/pnas.0801920105
Examples
# From Lozupone and Knight (2005), Figure 1.
# Panel A
x1 <- c(1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1)
x2 <- c(0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0)
unweighted_unifrac(x1, x2, lozupone_tree)
# Panel B
x3 <- c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1)
x4 <- c(1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0)
unweighted_unifrac(x3, x4, lozupone_tree)
# Can use named vectors to specify species
weighted_normalized_unifrac(
c(A=1, C=1, D=1, F=1, I=1, L=1, N=1),
c(B=1, E=1, G=1, H=1, J=1, K=1, M=1),
lozupone_tree)
weighted_normalized_unifrac(x1, x2, lozupone_tree)
# Generalized UniFrac is equal to weighted normalized UniFrac when alpha = 1
generalized_unifrac(x1, x2, lozupone_tree, alpha=1)
generalized_unifrac(x1, x2, lozupone_tree, alpha=0.5)