topWords {sentopics} | R Documentation |
Extract the most representative words from topics
Description
Extract the top words in each topic/sentiment from a
sentopicmodel
.
Usage
topWords(
x,
nWords = 10,
method = c("frequency", "probability", "term-score", "FREX"),
output = c("data.frame", "plot", "matrix"),
subset,
w = 0.5
)
plot_topWords(
x,
nWords = 10,
method = c("frequency", "probability", "term-score", "FREX"),
subset,
w = 0.5
)
Arguments
x |
|
nWords |
the number of top words to extract |
method |
specify if a re-ranking function should be applied before returning the top words. See Details for a description of each method. |
output |
determines the output of the function |
subset |
allows to subset using a logical expression, as in |
w |
only used when |
Details
"frequency"
ranks top words according to their frequency
within a topic. This method also reports the overall frequency of
each word. When returning a plot, the overall frequency is
represented with a grey bar.
"probability"
uses the estimated topic-word mixture \phi
to
rank top words.
"term-score"
implements the re-ranking method from Blei and
Lafferty (2009). This method down-weights terms that have high
probability in all topics using the following score:
\text{term-score}_{k,v} = \phi_{k, v}\log\left(\frac{\phi_{k,
v}}{\left(\prod^K_{j=1}\phi_{j,v}\right)^{\frac{1}{K}}}\right),
for
topic k
, vocabulary word v
and number of topics K
.
"FREX"
implements the re-ranking method from Bischof and Airoldi
(2012). This method used the weight w
to balance between
topic-word probability and topic exclusivity using the following
score:
\text{FREX}_{k,v}=\left(\frac{w}{\text{ECDF}\left(
\frac{\phi_{k,v}}{\sum_{j=1}^K\phi_{k,v}}\right)}
+ \frac{1-w}{\text{ECDF}\left(\phi_{k,v}\right)} \right),
for
topic k
, vocabulary word v
, number of topics K
and
weight w
, where \text{ECDF}
is the empirical cumulative
distribution function.
Value
The top words of the topic model. Depending on the output chosen, can
result in either a long-style data.frame, a ggplot2
object or a matrix.
Author(s)
Olivier Delmarcelle
References
Blei, DM. and Lafferty, JD. (2009). Topic models.. In Text Mining, chapter 4, 101–124.
Bischof JM. and Airoldi, EM. (2012). Summarizing Topical Content with Word Frequency and Exclusivity.. In Proceedings of the 29th International Conference on International Conference on Machine Learning, ICML'12, 9–16.
See Also
melt.sentopicmodel()
for extracting estimated mixtures
Examples
model <- LDA(ECB_press_conferences_tokens)
model <- fit(model, 10)
topWords(model)
topWords(model, output = "matrix")
topWords(model, method = "FREX")
plot_topWords(model)
plot_topWords(model, subset = topic %in% 1:2)
jst <- JST(ECB_press_conferences_tokens)
jst <- fit(jst, 10)
plot_topWords(jst)
plot_topWords(jst, subset = topic %in% 1:2 & sentiment == 3)