dist_between_centroids {usedist} | R Documentation |
Compute the distance between group centroids
Description
Compute the distance between group centroids
Usage
dist_between_centroids(d, idx1, idx2, squared = FALSE)
Arguments
d |
A distance matrix object of class |
idx1 |
A vector of items in group 1. |
idx2 |
A vector of items in group 2. |
squared |
If |
Details
If you have a distance matrix, and the objects are partitioned into groups, you might like to know the distance between the group centroids. The centroid of each group is simply the center of mass for the group.
It is possible to infer the distance between group centroids directly from
the distances between items in each group. The adonis
test in the
ecology package vegan
takes advantage of this approach to carry out
an ANOVA-like test on distances.
The approach rests on the assumption that the objects occupy some high-dimensional Euclidean space. However, we do not have to actually create the space to find the distance between centroids. Based on the assumption that such a space exists, we can use an algebraic formula to perform the computation.
The formulas for this were presented by Apostol and Mnatsakanian in 2003, though we need to re-arrange equation 28 in their paper to get the value we want:
| c_1 - c_2 | = \sqrt{
\frac{1}{n_1 n_2} \sum_{(1,2)} -
\frac{1}{n_1^2} \sum_{(1)} -
\frac{1}{n_2^2} \sum_{(2)}},
where n_1
is the number of samples in group 1, \sum_{(1)}
is the
sum of squared distances between items in group 1, and \sum_{(1,2)}
is
the sum of squared distances between items in group 1 and those in group 2.
Sometimes, the distance between centroids is not a real number, because it
is not possible to create a space where this distance exists. Mathematically,
we get a negative number underneath the square root in the equation above.
If this happens, the function returns NaN
. If you'd like to have
access to this value, you can set squared = TRUE
to return the
squared distance between centroids. In this case, you will never get
NaN
, but you might receive negative numbers in your result.
Value
The distance between group centroids (see details).
References
Apostol, T.M. and Mnatsakanian, M.A. Sums of squares of distances in m-space. Math. Assoc. Am. Monthly 110, 516 (2003).