of {nomclust} | R Documentation |
Occurence Frequency (OF) Measure
Description
The function calculates a dissimilarity matrix based on the OF similarity measure.
Usage
of(data, var.weights = NULL)
Arguments
data |
A data.frame or a matrix with cases in rows and variables in columns. |
var.weights |
A numeric vector setting weights to the used variables. One can choose the real numbers from zero to one. |
Details
The OF (Occurrence Frequency) measure was originally constructed for the text mining tasks, see (Sparck-Jones, 1972), later, it was adjusted for categorical variables, see (Boriah et al., 2008) It assigns higher weight to mismatches on less frequent values and otherwise.
Value
The function returns an object of the class "dist".
Author(s)
Zdenek Sulc.
Contact: zdenek.sulc@vse.cz
References
Boriah S., Chandola V., Kumar V. (2008). Similarity measures for categorical data: A comparative evaluation.
In: Proceedings of the 8th SIAM International Conference on Data Mining, SIAM, p. 243-254.
Spark-Jones K. (1972). A statistical interpretation of term specificity and its application in retrieval.
In Journal of Documentation, 28(1), p. 11-21. Later: Journal of Documentation, 60(5) (2002), p. 493-502.
See Also
anderberg
,
burnaby
,
eskin
,
gambaryan
,
goodall1
,
goodall2
,
goodall3
,
goodall4
,
iof
,
lin
,
lin1
,
sm
,
smirnov
,
ve
,
vm
.
Examples
# sample data
data(data20)
# dissimilarity matrix calculation
prox.of <- of(data20)
# dissimilarity matrix calculation with variable weights
weights.of <- of(data20, var.weights = c(0.7, 1, 0.9, 0.5, 0))