dsClustCompare {semiArtificial} | R Documentation |
Evaluate clustering similarity of two data sets
Description
Similarity of two data sets is compared with a method using any of clustering comparison metrics: Adjusted Rand Index (ARI), Fowlkes-Mallows index(FM), Jaccard Index (J), or Variation of Information index (VI).
Usage
dsClustCompare(data1, data2)
Arguments
data1 |
A |
data2 |
A |
Details
The function compares data stored in data1
with data2
by first performing partitioning around medoids (PAM)
clustering on data1
.
Instances from data2
are than assigned to the cluster with the closest medoid.
In second step PAM clustering is performed on data2
and instances from data1
are assigned to the clusters with closest medoids.
The procedure gives us two clusterings on the same instances which we can compare using any of ARI, FM, J, or VI.
The higher the value of ARI/FM/J the more similar are the two data sets, and reverse is true for VI, where two perfectly matching partitions
produce 0 score.
For random clustering ARI returns a value around zero (negative values are possible) and for perfectly matching clustering ARI is 1.
FM and J values are strictly in [0, 1].
Value
The method returns a value of a list containing ARI and/or FM, depending on the parameters.
Author(s)
Marko Robnik-Sikonja
See Also
Examples
# use iris data set
# create RBF generator
irisGenerator<- rbfDataGen(Species~.,iris)
# use the generator to create new data
irisNew <- newdata(irisGenerator, size=200)
# compare ARI computed on clustering with original and new data
dsClustCompare(iris, irisNew)