cosineTopics {ldaPrototype} | R Documentation |
Pairwise Cosine Similarities
Description
Calculates the similarity of all pairwise topic combinations using the Cosine Similarity.
Usage
cosineTopics(topics, progress = TRUE, pm.backend, ncpus)
Arguments
topics |
[ |
progress |
[ |
pm.backend |
[ |
ncpus |
[ |
Details
The Cosine Similarity for two topics \bm z_{i}
and \bm z_{j}
is calculated by
\cos(\theta | \bm z_{i}, \bm z_{j}) = \frac{ \sum_{v=1}^{V}{n_{i}^{(v)} n_{j}^{(v)}} }{ \sqrt{\sum_{v=1}^{V}{\left(n_{i}^{(v)}\right)^2}} \sqrt{\sum_{v=1}^{V}{\left(n_{j}^{(v)}\right)^2}} }
with \theta
determining the angle between the corresponding
count vectors \bm z_{i}
and \bm z_{j}
,
V
is the vocabulary size and n_k^{(v)}
is the count of
assignments of the v
-th word to the k
-th topic.
Value
[named list
] with entries
sims
[
lower triangular named matrix
] with all pairwise similarities of the given topics.wordslimit
[
integer
] = vocabulary size. SeejaccardTopics
for original purpose.wordsconsidered
[
integer
] = vocabulary size. SeejaccardTopics
for original purpose.param
[
named list
] with parametertype
[character(1)
]= "Cosine Similarity"
.
See Also
Other TopicSimilarity functions:
dendTopics()
,
getSimilarity()
,
jaccardTopics()
,
jsTopics()
,
rboTopics()
Examples
res = LDARep(docs = reuters_docs, vocab = reuters_vocab, n = 4, K = 10, num.iterations = 30)
topics = mergeTopics(res, vocab = reuters_vocab)
cosine = cosineTopics(topics)
cosine
sim = getSimilarity(cosine)
dim(sim)