gowdis {FD}R Documentation

Gower Dissimilarity

Description

gowdis measures the Gower (1971) dissimilarity for mixed variables, including asymmetric binary variables. Variable weights can be specified. gowdis implements Podani's (1999) extension to ordinal variables.

Usage

gowdis(x, w, asym.bin = NULL, ord = c("podani", "metric", "classic"))

Arguments

x

matrix or data frame containing the variables. Variables can be numeric, ordered, or factor. Symmetric or asymmetric binary variables should be numeric and only contain 0 and 1. character variables will be converted to factor. NAs are tolerated.

w

vector listing the weights for the variables in x. Can be missing, in which case all variables have equal weights.

asym.bin

vector listing the asymmetric binary variables in x.

ord

character string specifying the method to be used for ordinal variables (i.e. ordered). "podani" refers to Eqs. 2a-b of Podani (1999), while "metric" refers to his Eq. 3 (see ‘details’); both options convert ordinal variables to ranks. "classic" simply treats ordinal variables as continuous variables. Can be abbreviated.

Details

gowdis computes the Gower (1971) similarity coefficient exactly as described by Podani (1999), then converts it to a dissimilarity coefficient by using D=1SD = 1 - S. It integrates variable weights as described by Legendre and Legendre (1998).

Let X={xij}\mathbf{X} = \{x_{ij}\} be a matrix containing nn objects (rows) and mm columns (variables). The similarity GjkG_{jk} between objects jj and kk is computed as

Gjk=i=1nwijksijki=1nwijkG_{jk} = \frac{\sum_{i=1}^{n} w_{ijk} s_{ijk}}{\sum_{i=1}^{n} w_{ijk}}

,

where wijkw_{ijk} is the weight of variable ii for the jj-kk pair, and sijks_{ijk} is the partial similarity of variable ii for the jj-kk pair,

and where wijk=0w_{ijk} = 0 if objects jj and kk cannot be compared because xijx_{ij} or xikx_{ik} is unknown (i.e. NA).

For binary variables, sijk=0s_{ijk} = 0 if xijxikx_{ij} \neq x_{ik}, and sijk=1s_{ijk} = 1 if xij=xik=1x_{ij} = x_{ik} = 1 or if xij=xik=0x_{ij} = x_{ik} = 0.

For asymmetric binary variables, same as above except that wijk=0w_{ijk} = 0 if xij=xik=0x_{ij} = x_{ik} = 0.

For nominal variables, sijk=0s_{ijk} = 0 if xijxikx_{ij} \neq x_{ik} and sijk=1s_{ijk} = 1 if xij=xikx_{ij} = x_{ik}.

For continuous variables,

sijk=1xijxikxi.maxxi.mins_{ijk} = 1 - \frac{|x_{ij} - x_{ik}|} {x_{i.max} - x_{i.min}}

where xi.maxx_{i.max} and xi.minx_{i.min} are the maximum and minimum values of variable ii, respectively.

For ordinal variables, when ord = "podani" or ord = "metric", all xijx_{ij} are replaced by their ranks rijr_{ij} determined over all objects (such that ties are also considered), and then

if ord = "podani"

sijk=1s_{ijk} = 1 if rij=rikr_{ij} = r_{ik}, otherwise

sijk=1rijrik(Tij1)/2(Tik1)/2ri.maxri.min(Ti.max1)/2(Ti.min1)/2 s_{ijk} = 1 - \frac{|r_{ij} - r_{ik}| - (T_{ij} - 1)/2 - (T_{ik} - 1)/2 }{r_{i.max} - r_{i.min} - (T_{i.max} - 1)/2 - (T_{i.min}-1)/2 }

where TijT_{ij} is the number of objects which have the same rank score for variable ii as object jj (including jj itself), TikT_{ik} is the number of objects which have the same rank score for variable ii as object kk (including kk itself), ri.maxr_{i.max} and ri.minr_{i.min} are the maximum and minimum ranks for variable ii, respectively, Ti,maxT_{i,max} is the number of objects with the maximum rank, and Ti.minT_{i.min} is the number of objects with the minimum rank.

if ord = "metric"

sijk=1rijrikri.maxri.mins_{ijk} = 1 - \frac{|r_{ij} - r_{ik}|}{r_{i.max} - r_{i.min}}

When ord = "classic", ordinal variables are simply treated as continuous variables.

Value

an object of class dist with the following attributes: Labels, Types (the variable types, where 'C' is continuous/numeric, 'O' is ordinal, 'B' is symmetric binary, 'A' is asymmetric binary, and 'N' is nominal), Size, Metric.

Author(s)

Etienne Laliberté etiennelaliberte@gmail.com https://www.elaliberte.info/, with some help from Philippe Casgrain for the C interface.

References

Gower, J. C. (1971) A general coefficient of similarity and some of its properties. Biometrics 27:857-871.

Legendre, P. and L. Legendre (1998) Numerical Ecology. 2nd English edition. Amsterdam: Elsevier.

Podani, J. (1999) Extending Gower's general coefficient of similarity to ordinal characters. Taxon 48:331-340.

See Also

daisy is similar but less flexible, since it does not include variable weights and does not treat ordinal variables as described by Podani (1999). Using ord = "classic" reproduces the behaviour of daisy.

Examples

ex1 <- gowdis(dummy$trait)
ex1

# check attributes
attributes(ex1)

# to include weights
w <- c(4,3,5,1,2,8,3,6)
ex2 <- gowdis(dummy$trait, w)
ex2

# variable 7 as asymmetric binary
ex3 <- gowdis(dummy$trait, asym.bin = 7)
ex3

# example with trait data from New Zealand vascular plant species
ex4 <- gowdis(tussock$trait)

[Package FD version 1.0-12.3 Index]