dist_between_centroids {usedist}R Documentation

Compute the distance between group centroids

Description

Compute the distance between group centroids

Usage

dist_between_centroids(d, idx1, idx2, squared = FALSE)

Arguments

d

A distance matrix object of class dist.

idx1

A vector of items in group 1.

idx2

A vector of items in group 2.

squared

If TRUE, return the squared distance between centroids.

Details

If you have a distance matrix, and the objects are partitioned into groups, you might like to know the distance between the group centroids. The centroid of each group is simply the center of mass for the group.

It is possible to infer the distance between group centroids directly from the distances between items in each group. The adonis test in the ecology package vegan takes advantage of this approach to carry out an ANOVA-like test on distances.

The approach rests on the assumption that the objects occupy some high-dimensional Euclidean space. However, we do not have to actually create the space to find the distance between centroids. Based on the assumption that such a space exists, we can use an algebraic formula to perform the computation.

The formulas for this were presented by Apostol and Mnatsakanian in 2003, though we need to re-arrange equation 28 in their paper to get the value we want:

| c_1 - c_2 | = \sqrt{ \frac{1}{n_1 n_2} \sum_{(1,2)} - \frac{1}{n_1^2} \sum_{(1)} - \frac{1}{n_2^2} \sum_{(2)}},

where n_1 is the number of samples in group 1, \sum_{(1)} is the sum of squared distances between items in group 1, and \sum_{(1,2)} is the sum of squared distances between items in group 1 and those in group 2.

Sometimes, the distance between centroids is not a real number, because it is not possible to create a space where this distance exists. Mathematically, we get a negative number underneath the square root in the equation above. If this happens, the function returns NaN. If you'd like to have access to this value, you can set squared = TRUE to return the squared distance between centroids. In this case, you will never get NaN, but you might receive negative numbers in your result.

Value

The distance between group centroids (see details).

References

Apostol, T.M. and Mnatsakanian, M.A. Sums of squares of distances in m-space. Math. Assoc. Am. Monthly 110, 516 (2003).


[Package usedist version 0.4.0 Index]