SCLOP {ldaPrototype} | R Documentation |
Similarity/Stability of multiple sets of Objects using Clustering with Local Pruning
Description
The function SCLOP
calculates the S-CLOP value for the best possible
local pruning state of a dendrogram from dendTopics
.
The function pruneSCLOP
supplies the corresponding pruning state itself.
To get all pairwise S-CLOP scores of two LDA runs, the function SCLOP.pairwise
can be used. It returns a matrix of the pairwise S-CLOP scores.
All three functions use the function disparitySum
to calculate the
least possible sum of disparities (on the best possible local pruning state)
on a given dendrogram.
Usage
SCLOP(dend)
disparitySum(dend)
SCLOP.pairwise(sims)
Arguments
dend |
[ |
sims |
[ |
Details
For one specific cluster g
and R
LDA Runs the disparity is calculated by
U(g) := \frac{1}{R} \sum_{r=1}^R \vert t_r^{(g)} - 1 \vert \cdot \sum_{r=1}^R t_r^{(g)},
while \bm t^{(g)} = (t_1^{(g)}, ..., t_R^{(g)})^T
contains the number of topics that belong to the different LDA runs and that
occur in cluster g
.
The function disparitySum
returns the least possible sum of disparities
U_{\Sigma}(G^*)
for the best possible pruning state G^*
with U_{\Sigma}(G) = \sum_{g \in G} U(g) \to \min
.
The highest possible value for U_{\Sigma}(G^*)
is limited by
U_{\Sigma,\textsf{max}} := \sum_{g \in \tilde{G}} U(g) = N \cdot \frac{R-1}{R},
with \tilde{G}
denotes the corresponding worst case pruning state. This worst
case scenario is useful for normalizing the SCLOP scores.
The function SCLOP
then calculates the value
\textsf{S-CLOP}(G^*) := 1 - \frac{1}{U_{\Sigma,\textsf{max}}} \cdot \sum_{g \in G^*} U(g) ~\in [0,1],
where \sum\limits_{g \in G^*} U(g) = U_{\Sigma}(G^*)
.
Value
SCLOP
[0,1] value specifying the S-CLOP for the best possible local pruning state of the given dendrogram.
disparitySum
[
numeric(1)
] value specifying the least possible sum of disparities on the given dendrogram.SCLOP.pairwise
[
symmetrical named matrix
] with all pairwise S-CLOP scores of the given LDA runs.
See Also
Other SCLOP functions:
pruneSCLOP()
Other workflow functions:
LDARep()
,
dendTopics()
,
getPrototype()
,
jaccardTopics()
,
mergeTopics()
Examples
res = LDARep(docs = reuters_docs, vocab = reuters_vocab, n = 4, K = 10, num.iterations = 30)
topics = mergeTopics(res, vocab = reuters_vocab)
jacc = jaccardTopics(topics, atLeast = 2)
dend = dendTopics(jacc)
SCLOP(dend)
disparitySum(dend)
SCLOP.pairwise(jacc)
SCLOP.pairwise(getSimilarity(jacc))