textmodel_seqlda {seededlda} | R Documentation |
Sequential Latent Dirichlet allocation
Description
Implements Sequential Latent Dirichlet allocation (Sequential LDA).
textmodel_seqlda()
allows the users to classify sentences of texts. It
considers the topics of previous document in inferring the topics of currency
document. textmodel_seqlda()
is a shortcut equivalent to
textmodel_lda(gamma = 0.5)
. Seeded Sequential LDA is
textmodel_seededlda(gamma = 0.5)
.
Usage
textmodel_seqlda(
x,
k = 10,
max_iter = 2000,
auto_iter = FALSE,
alpha = 0.5,
beta = 0.1,
batch_size = 1,
model = NULL,
verbose = quanteda_options("verbose")
)
Arguments
x |
the dfm on which the model will be fit. |
k |
the number of topics. |
max_iter |
the maximum number of iteration in Gibbs sampling. |
auto_iter |
if |
alpha |
the values to smooth topic-document distribution. |
beta |
the values to smooth topic-word distribution. |
batch_size |
split the corpus into the smaller batches (specified in
proportion) for distributed computing; it is disabled when a batch include
all the documents |
model |
a fitted LDA model; if provided, |
verbose |
logical; if |
Value
The same as textmodel_lda()
References
Du, Lan et al. (2012). "Sequential Latent Dirichlet Allocation". doi.org/10.1007/s10115-011-0425-1. Knowledge and Information Systems.
Watanabe, Kohei & Baturo, Alexander. (2023). "Seeded Sequential LDA: A Semi-supervised Algorithm for Topic-specific Analysis of Sentences". doi:10.1177/08944393231178605. Social Science Computer Review.
Examples
require(seededlda)
require(quanteda)
corp <- head(data_corpus_moviereviews, 500) %>%
corpus_reshape()
toks <- tokens(corp, remove_punct = TRUE, remove_symbols = TRUE, remove_number = TRUE)
dfmt <- dfm(toks) %>%
dfm_remove(stopwords("en"), min_nchar = 2) %>%
dfm_trim(max_docfreq = 0.01, docfreq_type = "prop")
lda_seq <- textmodel_seqlda(dfmt, k = 6, max_iter = 500) # 6 topics
terms(lda_seq)
topics(lda_seq)