ve {nomclust}R Documentation

Variable Entropy (VE) Measure

Description

The function calculates a dissimilarity matrix based on the VE similarity measure.

Usage

ve(data, var.weights = NULL)

Arguments

data

A data.frame or a matrix with cases in rows and variables in columns.

var.weights

A numeric vector setting weights to the used variables. One can choose the real numbers from zero to one.

Details

The Variable Entropy similarity measure was introduced in (Sulc and Rezankova, 2019). It treats the similarity between two categories based on the within-cluster variability expressed by the normalized entropy. The measure assigns higher weights to rare categories.

Value

The function returns an object of the class "dist".

Author(s)

Zdenek Sulc.
Contact: zdenek.sulc@vse.cz

References

Boriah S., Chandola V., Kumar V. (2008). Similarity measures for categorical data: A comparative evaluation. In: Proceedings of the 8th SIAM International Conference on Data Mining, SIAM, p. 243-254.

Sulc Z. and Rezankova H. (2019). Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering. Journal of Classification. 2019, 35(1), p. 58-72. DOI: 10.1007/s00357-019-09317-5.

See Also

anderberg, burnaby, eskin, gambaryan, goodall1, goodall2, goodall3, goodall4, iof, lin, lin1, of, sm, smirnov, vm.

Examples

# sample data
data(data20)

# dissimilarity matrix calculation
prox.ve <- ve(data20)

# dissimilarity matrix calculation with variable weighting
prox.ve.2 <- ve(data20, var.weights = c(1, 0.8, 0.6, 0.4, 0.2))

# dissimilarity matrix calculation with variable weights
weights.ve <- ve(data20, var.weights = c(0.7, 1, 0.9, 0.5, 0))


[Package nomclust version 2.8.0 Index]