divergence {seededlda} | R Documentation |
Optimize the number of topics for LDA
Description
divergence()
computes the regularized topic divergence scores to help users
to find the optimal number of topics for LDA.
Usage
divergence(
x,
min_size = 0.01,
select = NULL,
regularize = TRUE,
newdata = NULL,
...
)
Arguments
x |
a LDA model fitted by |
min_size |
the minimum size of topics for regularized topic divergence.
Ignored when |
select |
names of topics for which the divergence is computed. |
regularize |
if |
newdata |
if provided, |
... |
additional arguments passed to textmodel_lda. |
Details
divergence()
computes the average Jensen-Shannon divergence
between all the pairs of topic vectors in x$phi
. The divergence score
maximizes when the chosen number of topic k
is optimal (Deveaud et al.,
2014). The regularized divergence penalizes topics smaller than min_size
to avoid fragmentation (Watanabe & Baturo, forthcoming).
Value
Returns a singple numeric value.
References
Deveaud, Romain et al. (2014). "Accurate and Effective Latent Concept Modeling for Ad Hoc Information Retrieval". doi:10.3166/DN.17.1.61-84. Document Numérique.
Watanabe, Kohei & Baturo, Alexander. (2023). "Seeded Sequential LDA: A Semi-supervised Algorithm for Topic-specific Analysis of Sentences". doi:10.1177/08944393231178605. Social Science Computer Review.