R: Descriptive statistics of word frequencies

mallet.word.freqs {mallet}

R Documentation

Descriptive statistics of word frequencies

Description

This method returns a data frame with one row for each unique vocabulary word, and three columns: the word as a character value, the total number of tokens of that word type, and the total number of documents that contain that word at least once. This information can be useful in identifying candidate stopwords.

Usage

mallet.word.freqs(topic.model)

Arguments

topic.model

A cc.mallet.topics.RTopicModel object created by MalletLDA.

Value

a data.frame with the word type (word), the word frequency (word.freq), and the document frequency (doc.freq)

Examples

## Not run: 
# Read in sotu example data
data(sotu)
sotu.instances <-
   mallet.import(id.array = row.names(sotu),
                 text.array = sotu[["text"]],
                 stoplist = mallet_stoplist_file_path("en"),
                 token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}")

# Create topic model
topic.model <- MalletLDA(num.topics=10, alpha.sum = 1, beta = 0.1)
topic.model$loadDocuments(sotu.instances)

# Get word frequencies
word_freqs <- mallet.word.freqs(topic.model)


## End(Not run)