mallet.word.freqs {mallet} | R Documentation |
Descriptive statistics of word frequencies
Description
This method returns a data frame with one row for each unique vocabulary word,
and three columns: the word as a character
value, the total number of
tokens of that word type, and the total number of documents that contain that
word at least once. This information can be useful in identifying candidate
stopwords.
Usage
mallet.word.freqs(topic.model)
Arguments
topic.model |
A |
Value
a data.frame
with the word type (word
), the word frequency (word.freq
), and the document frequency (doc.freq
)
See Also
Examples
## Not run:
# Read in sotu example data
data(sotu)
sotu.instances <-
mallet.import(id.array = row.names(sotu),
text.array = sotu[["text"]],
stoplist = mallet_stoplist_file_path("en"),
token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}")
# Create topic model
topic.model <- MalletLDA(num.topics=10, alpha.sum = 1, beta = 0.1)
topic.model$loadDocuments(sotu.instances)
# Get word frequencies
word_freqs <- mallet.word.freqs(topic.model)
## End(Not run)
[Package mallet version 1.3.0 Index]