dist.binary {ade4}R Documentation

Computation of Distance Matrices for Binary Data

Description

computes for binary data some distance matrice.

Usage

dist.binary(df, method = NULL, diag = FALSE, upper = FALSE)

Arguments

df

a matrix or a data frame with positive or null numeric values. Used with as.matrix(1 * (df > 0))

method

an integer between 1 and 10 . If NULL the choice is made with a console message. See details

diag

a logical value indicating whether the diagonal of the distance matrix should be printed by ‘print.dist’

upper

a logical value indicating whether the upper triangle of the distance matrix should be printed by ‘print.dist’

Details

Let be the contingency table of binary data such as n11=an_{11} = a, n10=bn_{10} = b, n01=cn_{01} = c and n00=dn_{00} = d. All these distances are of type d=1sd=\sqrt{1-s} with s a similarity coefficient.

1 = Jaccard index (1901)

S3 coefficient of Gower & Legendre s1=aa+b+cs_1 = \frac{a}{a+b+c}

2 = Simple matching coefficient of Sokal & Michener (1958)

S4 coefficient of Gower & Legendre s2=a+da+b+c+ds_2 =\frac{a+d}{a+b+c+d}

3 = Sokal & Sneath(1963)

S5 coefficient of Gower & Legendre s3=aa+2(b+c)s_3 =\frac{a}{a+2(b+c)}

4 = Rogers & Tanimoto (1960)

S6 coefficient of Gower & Legendre s4=a+d(a+2(b+c)+d)s_4 =\frac{a+d}{(a+2(b+c)+d)}

5 = Dice (1945) or Sorensen (1948)

S7 coefficient of Gower & Legendre s5=2a2a+b+cs_5 =\frac{2a}{2a+b+c}

6 = Hamann coefficient

S9 index of Gower & Legendre (1986) s6=a(b+c)+da+b+c+ds_6 =\frac{a-(b+c)+d}{a+b+c+d}

7 = Ochiai (1957)

S12 coefficient of Gower & Legendre s7=a(a+b)(a+c)s_7 =\frac{a}{\sqrt{(a+b)(a+c)}}

8 = Sokal & Sneath (1963)

S13 coefficient of Gower & Legendre s8=ad(a+b)(a+c)(d+b)(d+c)s_8 =\frac{ad}{\sqrt{(a+b)(a+c)(d+b)(d+c)}}

9 = Phi of Pearson

S14 coefficient of Gower & Legendre s9=adbc(a+b)(a+c)(b+d)(d+c)s_9 =\frac{ad-bc}{\sqrt{(a+b)(a+c)(b+d)(d+c)}}

10 = S2 coefficient of Gower & Legendre

s1=aa+b+c+ds_1 = \frac{a}{a+b+c+d}

Value

returns a distance matrix of class dist between the rows of the data frame

Author(s)

Daniel Chessel
Stéphane Dray stephane.dray@univ-lyon1.fr

References

Gower, J.C. and Legendre, P. (1986) Metric and Euclidean properties of dissimilarity coefficients. Journal of Classification, 3, 5–48.

Examples

data(aviurba)
for (i in 1:10) {
    d <- dist.binary(aviurba$fau, method = i)
    cat(attr(d, "method"), is.euclid(d), "\n")}

[Package ade4 version 1.7-22 Index]