gowdis {FD} | R Documentation |
Gower Dissimilarity
Description
gowdis
measures the Gower (1971) dissimilarity for mixed variables, including asymmetric binary variables. Variable weights can be specified. gowdis
implements Podani's (1999) extension to ordinal variables.
Usage
gowdis(x, w, asym.bin = NULL, ord = c("podani", "metric", "classic"))
Arguments
x |
matrix or data frame containing the variables. Variables can be |
w |
vector listing the weights for the variables in |
asym.bin |
vector listing the asymmetric binary variables in |
ord |
character string specifying the method to be used for ordinal variables (i.e. |
Details
gowdis
computes the Gower (1971) similarity coefficient exactly as described by Podani (1999), then converts it to a dissimilarity coefficient by using D = 1 - S
. It integrates variable weights as described by Legendre and Legendre (1998).
Let \mathbf{X} = \{x_{ij}\}
be a matrix containing n
objects (rows) and m
columns (variables). The similarity G_{jk}
between objects j
and k
is computed as
G_{jk} = \frac{\sum_{i=1}^{n} w_{ijk} s_{ijk}}{\sum_{i=1}^{n} w_{ijk}}
,
where w_{ijk}
is the weight of variable i
for the j
-k
pair, and s_{ijk}
is the partial similarity of variable i
for the j
-k
pair,
and where w_{ijk} = 0
if objects j
and k
cannot be compared because x_{ij}
or x_{ik}
is unknown (i.e. NA
).
For binary variables, s_{ijk} = 0
if x_{ij} \neq x_{ik}
, and s_{ijk} = 1
if x_{ij} = x_{ik} = 1
or if x_{ij} = x_{ik} = 0
.
For asymmetric binary variables, same as above except that w_{ijk} = 0
if x_{ij} = x_{ik} = 0
.
For nominal variables, s_{ijk} = 0
if x_{ij} \neq x_{ik}
and s_{ijk} = 1
if x_{ij} = x_{ik}
.
For continuous variables,
s_{ijk} = 1 - \frac{|x_{ij} - x_{ik}|} {x_{i.max} - x_{i.min}}
where x_{i.max}
and x_{i.min}
are the maximum and minimum values of variable i
, respectively.
For ordinal variables, when ord = "podani"
or ord = "metric"
, all x_{ij}
are replaced by their ranks r_{ij}
determined over all objects (such that ties are also considered), and then
if ord = "podani"
s_{ijk} = 1
if r_{ij} = r_{ik}
, otherwise
s_{ijk} = 1 - \frac{|r_{ij} - r_{ik}| - (T_{ij} - 1)/2 - (T_{ik} - 1)/2 }{r_{i.max} - r_{i.min} - (T_{i.max} - 1)/2 - (T_{i.min}-1)/2 }
where T_{ij}
is the number of objects which have the same rank score for variable i
as object j
(including j
itself), T_{ik}
is the number of objects which have the same rank score for variable i
as object k
(including k
itself), r_{i.max}
and r_{i.min}
are the maximum and minimum ranks for variable i
, respectively, T_{i,max}
is the number of objects with the maximum rank, and T_{i.min}
is the number of objects with the minimum rank.
if ord = "metric"
s_{ijk} = 1 - \frac{|r_{ij} - r_{ik}|}{r_{i.max} - r_{i.min}}
When ord = "classic"
, ordinal variables are simply treated as continuous variables.
Value
an object of class dist
with the following attributes: Labels
, Types
(the variable types, where 'C' is continuous/numeric, 'O' is ordinal, 'B' is symmetric binary, 'A' is asymmetric binary, and 'N' is nominal), Size
, Metric
.
Author(s)
Etienne Laliberté etiennelaliberte@gmail.com https://www.elaliberte.info/, with some help from Philippe Casgrain for the C interface.
References
Gower, J. C. (1971) A general coefficient of similarity and some of its properties. Biometrics 27:857-871.
Legendre, P. and L. Legendre (1998) Numerical Ecology. 2nd English edition. Amsterdam: Elsevier.
Podani, J. (1999) Extending Gower's general coefficient of similarity to ordinal characters. Taxon 48:331-340.
See Also
daisy
is similar but less flexible, since it does not include variable weights and does not treat ordinal variables as described by Podani (1999). Using ord = "classic"
reproduces the behaviour of daisy
.
Examples
ex1 <- gowdis(dummy$trait)
ex1
# check attributes
attributes(ex1)
# to include weights
w <- c(4,3,5,1,2,8,3,6)
ex2 <- gowdis(dummy$trait, w)
ex2
# variable 7 as asymmetric binary
ex3 <- gowdis(dummy$trait, asym.bin = 7)
ex3
# example with trait data from New Zealand vascular plant species
ex4 <- gowdis(tussock$trait)