topic_coherence {topicdoc} | R Documentation |
Calculate the topic coherence for each topic in a topic model
Description
Using the the N highest probability tokens for each topic, calculate the topic coherence for each topic
Usage
topic_coherence(topic_model, dtm_data, top_n_tokens = 10, smoothing_beta = 1)
Arguments
topic_model |
a fitted topic model object from one of the following:
|
dtm_data |
a document-term matrix of token counts coercible to |
top_n_tokens |
an integer indicating the number of top words to consider, the default is 10 |
smoothing_beta |
a numeric indicating the value to use to smooth the document frequencies in order avoid log zero issues, the default is 1 |
Value
A vector of topic coherence scores with length equal to the number of topics in the fitted model
References
Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011, July). "Optimizing semantic coherence in topic models." In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 262-272). Association for Computational Linguistics. Chicago
McCallum, Andrew Kachites. "MALLET: A Machine Learning for Language Toolkit." https://mallet.cs.umass.edu 2002.
See Also
Examples
# Using the example from the LDA function
library(topicmodels)
data("AssociatedPress", package = "topicmodels")
lda <- LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k = 2)
topic_coherence(lda, AssociatedPress[1:20,])