ve {nomclust} | R Documentation |
Variable Entropy (VE) Measure
Description
The function calculates a dissimilarity matrix based on the VE similarity measure.
Usage
ve(data, var.weights = NULL)
Arguments
data |
A data.frame or a matrix with cases in rows and variables in columns. |
var.weights |
A numeric vector setting weights to the used variables. One can choose the real numbers from zero to one. |
Details
The Variable Entropy similarity measure was introduced in (Sulc and Rezankova, 2019). It treats the similarity between two categories based on the within-cluster variability expressed by the normalized entropy. The measure assigns higher weights to rare categories.
Value
The function returns an object of the class "dist".
Author(s)
Zdenek Sulc.
Contact: zdenek.sulc@vse.cz
References
Boriah S., Chandola V., Kumar V. (2008). Similarity measures for categorical data: A comparative evaluation.
In: Proceedings of the 8th SIAM International Conference on Data Mining, SIAM, p. 243-254.
Sulc Z. and Rezankova H. (2019). Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering. Journal of Classification. 2019, 35(1), p. 58-72. DOI: 10.1007/s00357-019-09317-5.
See Also
anderberg
,
burnaby
,
eskin
,
gambaryan
,
goodall1
,
goodall2
,
goodall3
,
goodall4
,
iof
,
lin
,
lin1
,
of
,
sm
,
smirnov
,
vm
.
Examples
# sample data
data(data20)
# dissimilarity matrix calculation
prox.ve <- ve(data20)
# dissimilarity matrix calculation with variable weighting
prox.ve.2 <- ve(data20, var.weights = c(1, 0.8, 0.6, 0.4, 0.2))
# dissimilarity matrix calculation with variable weights
weights.ve <- ve(data20, var.weights = c(0.7, 1, 0.9, 0.5, 0))