MalletLDA {mallet} | R Documentation |
Create a Mallet topic model trainer
Description
This function creates a java cc.mallet.topics.RTopicModel object that wraps a
Mallet topic model trainer java object, cc.mallet.topics.ParallelTopicModel.
Note that you can call any of the methods of this java object as properties.
In the example below, I make a call directly to the
topic.model$setAlphaOptimization(20, 50)
java method,
which passes this update to the model itself.
Usage
MalletLDA(num.topics = 10, alpha.sum = 5, beta = 0.01)
Arguments
num.topics |
The number of topics to use. If not specified, this defaults to 10. |
alpha.sum |
This is the magnitude of the Dirichlet prior over the topic distribution of a document.
The default value is 5.0. With 10 topics, this setting leads to a Dirichlet with
parameter |
beta |
This is the per-word weight of the Dirichlet prior over topic-word distributions. The magnitude of the distribution (the sum over all words of this parameter) is determined by the number of words in the vocabulary. Again, this value may change due to hyperparameter optimization. |
Value
a cc.mallet.topics.RTopicModel
object
Examples
## Not run:
# Read in sotu example data
data(sotu)
sotu.instances <-
mallet.import(id.array = row.names(sotu),
text.array = sotu[["text"]],
stoplist = mallet_stoplist_file_path("en"),
token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}")
# Create topic model
topic.model <- MalletLDA(num.topics=10, alpha.sum = 1, beta = 0.1)
topic.model$loadDocuments(sotu.instances)
# Train topic model
topic.model$train(200)
# Extract results
doc_topics <- mallet.doc.topics(topic.model, smoothed=TRUE, normalized=TRUE)
topic_words <- mallet.topic.words(topic.model, smoothed=TRUE, normalized=TRUE)
top_words <- mallet.top.words(topic.model, word.weights = topic_words[2,], num.top.words = 5)
## End(Not run)