fit_topic_model {fastTopics} | R Documentation |
Simple Interface for Fitting a Multinomial Topic Model
Description
Fits a multinomial topic model to the count data,
hiding most of the complexities of model fitting. The default
optimization settings used here are intended to work well in a wide
range of data sets, although some fine-tuning may be needed for
more difficult cases. For full control, use fit_poisson_nmf
.
Usage
fit_topic_model(
X,
k,
numiter.main = 100,
numiter.refine = 100,
method.main = "em",
method.refine = "scd",
init.method = c("topicscore", "random"),
control.init = list(),
control.main = list(numiter = 4),
control.refine = list(numiter = 4, extrapolate = TRUE),
verbose = c("progressbar", "detailed", "none")
)
Arguments
X |
The n x m matrix of counts; all entries of X should be
non-negative. It can be a sparse matrix (class |
k |
The number of topics. Must be 2 or greater. |
numiter.main |
Number of updates of the factors and loadings to perform in the main model fitting step. Should be 1 or more. |
numiter.refine |
Number of updates of the factors and loadings to perform in the model refinement step. |
method.main |
The method to use for updating the factors and
loadings in the main model fitting step. Passed as argument
"method" to |
method.refine |
The method to use for updating the factors in
evthe model refinement step. Passed as argument "method"
to |
init.method |
The method used to initialize the factors and
loadings. See |
control.init |
A list of parameters controlling the behaviour
of the optimization and Topic SCORE method in the call to
|
control.main |
A list of parameters controlling the behaviour
of the optimization in the main model fitting step. This is passed
as argument "control" to |
control.refine |
A list of parameters controlling the
behaviour of the of the optimization algorithm in the model
refinement step. This is passed as argument "control" to
|
verbose |
When |
Details
With the default settings, the model fitting is
accomplished in four steps: (1) initialize the Poisson NMF model
fit (init_poisson_nmf
); (2) perform the main model fitting
step by running 100 EM updates using fit_poisson_nmf
; (3)
refine the fit by running 100 extrapolated SCD updates, again using
fit_poisson_nmf
; and (4) recover the multinomial topic model
by calling poisson2multinom
.
This two-stage fitting approach is based on our findings that the EM algorithm initially makes rapid progress toward a solution, but its convergence slows considerably as the iterates approach a solution. Close to a solution, we have found that other algorithms make much more rapid progress than EM; in particularly, we found that the extrapolated SCD updates usually performed best). For larger data sets, more updates in the main model fitting and refinement steps may be needed to obtain a good fit.
For larger data sets, more than 200 updates may be needed to obtain a good fit.
Value
A multinomial topic model fit; see
poisson2multinom
and fit_poisson_nmf
for details. Note that outputted likelihoods and deviances in
progress
are evaluated with respect to the equivalent
Poisson NMF model.
References
Dey, K. K., Hsiao, C. J. and Stephens, M. (2017). Visualizing the structure of RNA-seq expression data using grade of membership models. PLoS Genetics 13, e1006599.
Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993-1022.
Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd International ACM SIGIR Conference, 50-57. doi:10.1145/312624.312649
See Also
init_poisson_nmf
,
fit_poisson_nmf
,
poisson2multinom
,
fit_multinom_model
Examples
library(Matrix)
set.seed(1)
X <- simulate_count_data(80,100,k = 3,sparse = TRUE)$X
fit <- fit_topic_model(X,k = 3)
print(summary(fit))