R: Gower Distance for Mixed Variables

dgower {ICGE}

R Documentation

Gower Distance for Mixed Variables

Description

dgower computes and returns the Gower distance matrix for mixed variables.

Usage

dgower(x, type = list())

Arguments

`x`	data matrix.
`type`	it is a list with components `cuant`, `bin`, `nom`. Each component indicates the column position of the quantitative, binary or nominal variables, respectively.

Details

The distance between two pairs of objects i and j is obtained as \sqrt{2(1-s_{ij})} where s_{ij} is the Gower's similarity coefficient for mixed data. This function allows to include missing values (as NA) and therefore calculates distances based on Gower's weighted similarity coefficient.

Value

A dist object with distance information.

Note

There is the function daisy() in cluster package which can perform the Gower distance for mixed variables. The difference is that in daisy() the distance is calculated as d(i,j)=1-s_{ij} and in dgower() it is calculated as dij=sqrt(1-s_{ij}).

Author(s)

Itziar Irigoien itziar.irigoien@ehu.eus; Konputazio Zientziak eta Adimen Artifiziala, Euskal Herriko Unibertsitatea (UPV/EHU), Donostia, Spain.

Conchita Arenas carenas@ub.edu; Departament d'Estadistica, Universitat de Barcelona, Barcelona, Spain.

References

Gower, J.C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27, 857–871.

Examples

#Generate 10 objects in dimension 6
# Quantitative variables
mu <- sample(1:10, 2, replace=TRUE)
xc <- matrix(rnorm(10*2, mean = mu, sd = 1), ncol=2, byrow=TRUE)

# Binary variables
xb <- cbind(rbinom(10, 1, 0.1), rbinom(10, 1, 0.5), rbinom(10, 1, 0.9))

# Nominal variables
xn <- matrix(sample(1:3, 10, replace=TRUE), ncol=1)

x <- cbind(xc, xb, xn)

# Distances
d <- dgower(x, type=list(cuant=1:2, bin=3:5, nom=6))

[Package ICGE version 0.4.2 Index]