R: Fit a Latent Dirichlet Allocation topic model

FitLdaModel {textmineR}

R Documentation

Fit a Latent Dirichlet Allocation topic model

Description

Fit a Latent Dirichlet Allocation topic model using collapsed Gibbs sampling.

Usage

FitLdaModel(
  dtm,
  k,
  iterations = NULL,
  burnin = -1,
  alpha = 0.1,
  beta = 0.05,
  optimize_alpha = FALSE,
  calc_likelihood = FALSE,
  calc_coherence = TRUE,
  calc_r2 = FALSE,
  ...
)

Arguments

`dtm`	A document term matrix or term co-occurrence matrix of class dgCMatrix
`k`	Integer number of topics
`iterations`	Integer number of iterations for the Gibbs sampler to run. A future version may include automatic stopping criteria.
`burnin`	Integer number of burnin iterations. If `burnin` is greater than -1, the resulting "phi" and "theta" matrices are an average over all iterations greater than `burnin`.
`alpha`	Vector of length `k` for asymmetric or a number for symmetric. This is the prior for topics over documents
`beta`	Vector of length `ncol(dtm)` for asymmetric or a number for symmetric. This is the prior for words over topics.
`optimize_alpha`	Logical. Do you want to optimize alpha every 10 Gibbs iterations? Defaults to `FALSE`.
`calc_likelihood`	Do you want to calculate the likelihood every 10 Gibbs iterations? Useful for assessing convergence. Defaults to `FALSE`.
`calc_coherence`	Do you want to calculate probabilistic coherence of topics after the model is trained? Defaults to `TRUE`.
`calc_r2`	Do you want to calculate R-squared after the model is trained? Defaults to `FALSE`.
`...`	Other arguments to be passed to `TmParallelApply`

Details

EXPLAIN IMPLEMENTATION DETAILS

Value

Returns an S3 object of class c("LDA", "TopicModel"). DESCRIBE MORE

Examples

# load some data
data(nih_sample_dtm)

# fit a model 
set.seed(12345)
m <- FitLdaModel(dtm = nih_sample_dtm[1:20,], k = 5,
                 iterations = 200, burnin = 175)

str(m)

# predict on held-out documents using gibbs sampling "fold in"
p1 <- predict(m, nih_sample_dtm[21:100,], method = "gibbs",
              iterations = 200, burnin = 175)

# predict on held-out documents using the dot product method
p2 <- predict(m, nih_sample_dtm[21:100,], method = "dot")

# compare the methods
barplot(rbind(p1[1,],p2[1,]), beside = TRUE, col = c("red", "blue"))

[Package textmineR version 3.0.5 Index]