integration {varclust} | R Documentation |
Computes integration and acontamination of the clustering
Description
Integartion and acontamination are measures of the quality of a clustering
with a reference to a true partition. Let X = (x_1, \ldots x_p)
be the
data set, A
be a partition into clusters A_1, \ldots A_n
(true
partition) and B
be a partition into clusters B_1, \ldots, B_m
.
Then for cluster A_j
integration is eqaul to:
Int(A_j) =
\frac{max_{k = 1, \ldots, m} \# \{ i \in \{ 1, \ldots p \}: x_i \in A_j
\wedge x_i \in B_k \} }{\# A_j}
The B_k
for which the value is
maximized is called the integrating cluster of A_j
. Then the
integration for the whole clustering equals is Int(A,B) = \frac{1}{n}
\sum_{j=1}^n Int(A_j)
.The acontamination is defined by:
Acont(A_j) =
\frac{ \# \{ i \in \{ 1, \ldots p \}: x_i \in A_j \wedge x_i \in B_k \} }{\#
B_k}
where B_k
is the integrating cluster for A_j
. Then the
acontamination for the whole dataset is Acont(A,B) = \frac{1}{n}
\sum_{j=1}^n Acont(A_j)
Usage
integration(group, true_group)
Arguments
group |
A vector, first partition. |
true_group |
A vector, second (reference) partition. |
Value
An array containing values of integration and acontamination.
References
M. Sołtys. Metody analizy skupień. Master’s thesis, Wrocław University of Technology, 2010
Examples
sim.data <- data.simulation(n = 20, SNR = 1, K = 2, numb.vars = 50, max.dim = 2)
true_segmentation <- rep(1:2, each=50)
mlcc.fit <- mlcc.reps(sim.data$X, numb.clusters = 2, max.dim = 2, numb.cores=1)
integration(mlcc.fit$segmentation, true_segmentation)