R: Get strings containing the most probable words for each topic

mallet.topic.labels {mallet}

R Documentation

Get strings containing the most probable words for each topic

Description

This function returns a vector of strings, one for each topic, with the most probable words in that topic separated by spaces.

Usage

mallet.topic.labels(topic.model, topic.words = NULL, num.top.words = 3, ...)

Arguments

`topic.model`	A `cc.mallet.topics.RTopicModel` object created by `MalletLDA`.
`topic.words`	The matrix of topic-word weights returned by `mallet.topic.words` Default (NULL) is to use the `topic.model` to extract the `topic.words`.
`num.top.words`	The number of words to include for each topic. Defaults to 3.
`...`	Further arguments supplied to `mallet.topic.words`.

Value

a character vector with one element per topic

Examples

## Not run: 
# Read in sotu example data
data(sotu)
sotu.instances <-
   mallet.import(id.array = row.names(sotu),
                 text.array = sotu[["text"]],
                 stoplist = mallet_stoplist_file_path("en"),
                 token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}")

# Create topic model
topic.model <- MalletLDA(num.topics=10, alpha.sum = 1, beta = 0.1)
topic.model$loadDocuments(sotu.instances)

# Train topic model
topic.model$train(200)

# Create hiearchical clusters of topics
doc_topics <- mallet.doc.topics(topic.model, smoothed=TRUE, normalized=TRUE)
topic_words <- mallet.topic.words(topic.model, smoothed=TRUE, normalized=TRUE)
topic_labels <- mallet.topic.labels(topic.model)
plot(mallet.topic.hclust(doc_topics, topic_words, balance = 0.3), labels=topic_labels)

## End(Not run)