rboTopics {ldaPrototype} | R Documentation |
Pairwise RBO Similarities
Description
Calculates the similarity of all pairwise topic combinations using the rank-biased overlap (RBO) Similarity.
Usage
rboTopics(topics, k, p, progress = TRUE, pm.backend, ncpus)
Arguments
topics |
[ |
k |
[ |
p |
[0,1] |
progress |
[ |
pm.backend |
[ |
ncpus |
[ |
Details
The RBO Similarity for two topics \bm z_{i}
and \bm z_{j}
is calculated by
RBO(\bm z_{i}, \bm z_{j} \mid k, p) = 2p^k\frac{\left|Z_{i}^{(k)} \cap Z_{j}^{(k)}\right|}{\left|Z_{i}^{(k)}\right| + \left|Z_{j}^{(k)}\right|} + \frac{1-p}{p} \sum_{d=1}^k 2 p^d\frac{\left|Z_{i}^{(d)} \cap Z_{j}^{(d)}\right|}{\left|Z_{i}^{(d)}\right| + \left|Z_{j}^{(d)}\right|}
with Z_{i}^{(d)}
is the vocabulary set of topic \bm z_{i}
down to
rank d
. Ties in ranks are resolved by taking the minimum.
The value wordsconsidered
describes the number of words per topic
ranked at rank k
or above.
Value
[named list
] with entries
sims
[
lower triangular named matrix
] with all pairwise similarities of the given topics.wordslimit
[
integer
] = vocabulary size. SeejaccardTopics
for original purpose.wordsconsidered
[
integer
] = vocabulary size. SeejaccardTopics
for original purpose.param
[
named list
] with parametertype
[character(1)
]= "RBO Similarity"
,k
[integer(1)
] andp
[0,1]. See above for explanation.
References
Webber, William, Alistair Moffat and Justin Zobel (2010). "A similarity measure for indefinite rankings". In: ACM Transations on Information Systems 28(4), p.20:1–-20:38, DOI 10.1145/1852102.1852106, URL https://doi.acm.org/10.1145/1852102.1852106
See Also
Other TopicSimilarity functions:
cosineTopics()
,
dendTopics()
,
getSimilarity()
,
jaccardTopics()
,
jsTopics()
Examples
res = LDARep(docs = reuters_docs, vocab = reuters_vocab, n = 4, K = 10, num.iterations = 30)
topics = mergeTopics(res, vocab = reuters_vocab)
rbo = rboTopics(topics, k = 12, p = 0.9)
rbo
sim = getSimilarity(rbo)
dim(sim)