FitLdaModel {textmineR} | R Documentation |
Fit a Latent Dirichlet Allocation topic model
Description
Fit a Latent Dirichlet Allocation topic model using collapsed Gibbs sampling.
Usage
FitLdaModel(
dtm,
k,
iterations = NULL,
burnin = -1,
alpha = 0.1,
beta = 0.05,
optimize_alpha = FALSE,
calc_likelihood = FALSE,
calc_coherence = TRUE,
calc_r2 = FALSE,
...
)
Arguments
dtm |
A document term matrix or term co-occurrence matrix of class dgCMatrix |
k |
Integer number of topics |
iterations |
Integer number of iterations for the Gibbs sampler to run. A future version may include automatic stopping criteria. |
burnin |
Integer number of burnin iterations. If |
alpha |
Vector of length |
beta |
Vector of length |
optimize_alpha |
Logical. Do you want to optimize alpha every 10 Gibbs iterations?
Defaults to |
calc_likelihood |
Do you want to calculate the likelihood every 10 Gibbs iterations?
Useful for assessing convergence. Defaults to |
calc_coherence |
Do you want to calculate probabilistic coherence of topics
after the model is trained? Defaults to |
calc_r2 |
Do you want to calculate R-squared after the model is trained?
Defaults to |
... |
Other arguments to be passed to |
Details
EXPLAIN IMPLEMENTATION DETAILS
Value
Returns an S3 object of class c("LDA", "TopicModel"). DESCRIBE MORE
Examples
# load some data
data(nih_sample_dtm)
# fit a model
set.seed(12345)
m <- FitLdaModel(dtm = nih_sample_dtm[1:20,], k = 5,
iterations = 200, burnin = 175)
str(m)
# predict on held-out documents using gibbs sampling "fold in"
p1 <- predict(m, nih_sample_dtm[21:100,], method = "gibbs",
iterations = 200, burnin = 175)
# predict on held-out documents using the dot product method
p2 <- predict(m, nih_sample_dtm[21:100,], method = "dot")
# compare the methods
barplot(rbind(p1[1,],p2[1,]), beside = TRUE, col = c("red", "blue"))