R: Simple Interface for Fitting a Multinomial Topic Model

fit_topic_model {fastTopics}

R Documentation

Simple Interface for Fitting a Multinomial Topic Model

Description

Fits a multinomial topic model to the count data, hiding most of the complexities of model fitting. The default optimization settings used here are intended to work well in a wide range of data sets, although some fine-tuning may be needed for more difficult cases. For full control, use fit_poisson_nmf.

Usage

fit_topic_model(
  X,
  k,
  numiter.main = 100,
  numiter.refine = 100,
  method.main = "em",
  method.refine = "scd",
  init.method = c("topicscore", "random"),
  control.init = list(),
  control.main = list(numiter = 4),
  control.refine = list(numiter = 4, extrapolate = TRUE),
  verbose = c("progressbar", "detailed", "none")
)

Arguments

`X`	The n x m matrix of counts; all entries of X should be non-negative. It can be a sparse matrix (class `"dgCMatrix"`) or dense matrix (class `"matrix"`).
`k`	The number of topics. Must be 2 or greater.
`numiter.main`	Number of updates of the factors and loadings to perform in the main model fitting step. Should be 1 or more.
`numiter.refine`	Number of updates of the factors and loadings to perform in the model refinement step.
`method.main`	The method to use for updating the factors and loadings in the main model fitting step. Passed as argument "method" to `fit_poisson_nmf`.
`method.refine`	The method to use for updating the factors in evthe model refinement step. Passed as argument "method" to `fit_poisson_nmf`.
`init.method`	The method used to initialize the factors and loadings. See `init_poisson_nmf` for details.
`control.init`	A list of parameters controlling the behaviour of the optimization and Topic SCORE method in the call to `init_poisson_nmf`. This is passed as argument "control" to `init_poisson_nmf`.
`control.main`	A list of parameters controlling the behaviour of the optimization in the main model fitting step. This is passed as argument "control" to `fit_poisson_nmf`.
`control.refine`	A list of parameters controlling the behaviour of the of the optimization algorithm in the model refinement step. This is passed as argument "control" to `fit_poisson_nmf`.
`verbose`	When `verbose = "progressbar"` or `verbose = "detailed"`, information about the progress of the model fitting is printed to the console. See `fit_poisson_nmf` for more information.

Details

With the default settings, the model fitting is accomplished in four steps: (1) initialize the Poisson NMF model fit (init_poisson_nmf); (2) perform the main model fitting step by running 100 EM updates using fit_poisson_nmf; (3) refine the fit by running 100 extrapolated SCD updates, again using fit_poisson_nmf; and (4) recover the multinomial topic model by calling poisson2multinom.

This two-stage fitting approach is based on our findings that the EM algorithm initially makes rapid progress toward a solution, but its convergence slows considerably as the iterates approach a solution. Close to a solution, we have found that other algorithms make much more rapid progress than EM; in particularly, we found that the extrapolated SCD updates usually performed best). For larger data sets, more updates in the main model fitting and refinement steps may be needed to obtain a good fit.

For larger data sets, more than 200 updates may be needed to obtain a good fit.

Value

A multinomial topic model fit; see poisson2multinom and fit_poisson_nmf for details. Note that outputted likelihoods and deviances in progress are evaluated with respect to the equivalent Poisson NMF model.

References

Dey, K. K., Hsiao, C. J. and Stephens, M. (2017). Visualizing the structure of RNA-seq expression data using grade of membership models. PLoS Genetics 13, e1006599.

Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993-1022.

Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd International ACM SIGIR Conference, 50-57. doi:10.1145/312624.312649

Examples

library(Matrix)
set.seed(1)
X <- simulate_count_data(80,100,k = 3,sparse = TRUE)$X
fit <- fit_topic_model(X,k = 3)
print(summary(fit))

[Package fastTopics version 0.6-192 Index]