dist_from_corpus {topicdoc} | R Documentation |
Calculate the distance of each topic from the overall corpus token distribution
Description
The Hellinger distance between the token probabilities or betas for each topic and the overall probability for the word in the corpus is calculated.
Usage
dist_from_corpus(topic_model, dtm_data)
Arguments
topic_model |
a fitted topic model object from one of the following:
|
dtm_data |
a document-term matrix of token counts coercible to |
Value
A vector of distances with length equal to the number of topics in the fitted model
References
Jordan Boyd-Graber, David Mimno, and David Newman, 2014. Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements. CRC Handbooks ofModern Statistical Methods. CRC Press, Boca Raton, Florida.
Examples
# Using the example from the LDA function
library(topicmodels)
data("AssociatedPress", package = "topicmodels")
lda <- LDA(AssociatedPress[1:20,], control = list(alpha = 0.1), k = 2)
dist_from_corpus(lda, AssociatedPress[1:20,])
[Package topicdoc version 0.1.1 Index]