R: Standard External Measures: Rand index, Jaccard coefficient...

std.ext {clv}

R Documentation

Standard External Measures: Rand index, Jaccard coefficient etc.

Description

Group of functions which compute standard external measures such as: Rand statistic and Folkes and Mallows index, Jaccard coefficient etc.

Usage

std.ext(clust1, clust2)
clv.Rand(external.ind)
clv.Jaccard(external.ind)
clv.Folkes.Mallows(external.ind)
clv.Phi(external.ind)
clv.Russel.Rao(external.ind)

Arguments

`clust1`	integer `vector` with information about cluster id the object is assigned to. If vector is not integer type, it will be coerced with warning.
`clust2`	integer `vector` with information about cluster id the object is assigned to. If vector is not integer type, it will be coerced with warning.
`external.ind`	`vector` or `list` with four values SS,SD,DS,DD which are result of function `std.ext`

Details

Two input vectors keep information about two different partitionings (let say P and P') of the same data set X. We refer to a pair of points (xi, xj) (we assume that i != j) from the data set using the following terms:

`SS`	- number of pairs where both points belongs to the same cluster in both partitionings,
`SD`	- number of pairs where both points belongs to the same cluster in partitioning P but in P' do not,
`DS`	- number of pairs where in partitioning P both point belongs to different clusters but in P' do not,
`DD`	- number of pairs where both objects belongs to different clusters in both partitionings.

Those values are used to compute (M = SS + SD + DS +DD):

Rand statistic	R = (SS + DD)/M
Jaccard coefficient	J = SS/(SS + SD + DS)
Folkes and Mallows index	FM = sqrt(SS/(SS + SD))*sqrt(SS/(SS + DS))
Russel and Rao index	RR = SS/M
Phi index	Ph = (SSDD - SDDS)/((SS+SD)(SS+DS)(SD+DD)(DS+DD)).

Value

std.ext returns a list containing four values: SS, SD, DS, DD.

clv.Rand returns R value.

clv.Jaccard returns J value.

clv.Folkes.Mallows returns FM value.

clv.Phi returns Ph value.

clv.Russel.Rao returns RR value.

Author(s)

Lukasz Nieweglowski

References

G. Saporta and G. Youness Comparing two partitions: Some Proposals and Experiments. http://cedric.cnam.fr/PUBLIS/RC405.pdf

Examples

# load and prepare data
library(clv)
data(iris)
iris.data <- iris[,1:4]

# cluster data
pam.mod <- pam(iris.data,3) # create three clusters
v.pred <- as.integer(pam.mod$clustering) # get cluster ids associated to given data objects
v.real <- as.integer(iris$Species) # get also real cluster ids

# compare true clustering with those given by the algorithm
# 1. optimal solution:

# use only once std.ext function
std <- std.ext(v.pred, v.real)
# to compute three indicies based on std.ext result
rand1 <- clv.Rand(std)
jaccard1 <- clv.Jaccard(std)
folk.mal1 <- clv.Folkes.Mallows(std)

# 2. functional solution:

# prepare set of functions which compare two clusterizations
Rand <- function(clust1,clust2) clv.Rand(std.ext(clust1,clust2))
Jaccard <- function(clust1,clust2) clv.Jaccard(std.ext(clust1,clust2))
Folkes.Mallows <- function(clust1,clust2) clv.Folkes.Mallows(std.ext(clust1,clust2))

# compute indicies
rand2 <- Rand(v.pred,v.real)
jaccard2 <- Jaccard(v.pred,v.real)
folk.mal2 <- Folkes.Mallows(v.pred,v.real)

[Package clv version 0.3-2.4 Index]