| textmodel_seqlda {seededlda} | R Documentation | 
Sequential Latent Dirichlet allocation
Description
Implements Sequential Latent Dirichlet allocation (Sequential LDA).
textmodel_seqlda() allows the users to classify sentences of texts. It
considers the topics of previous document in inferring the topics of currency
document. textmodel_seqlda() is a shortcut equivalent to
textmodel_lda(gamma = 0.5). Seeded Sequential LDA is
textmodel_seededlda(gamma = 0.5).
Usage
textmodel_seqlda(
  x,
  k = 10,
  max_iter = 2000,
  auto_iter = FALSE,
  alpha = 0.5,
  beta = 0.1,
  batch_size = 1,
  model = NULL,
  verbose = quanteda_options("verbose")
)
Arguments
| x | the dfm on which the model will be fit. | 
| k | the number of topics. | 
| max_iter | the maximum number of iteration in Gibbs sampling. | 
| auto_iter | if  | 
| alpha | the values to smooth topic-document distribution. | 
| beta | the values to smooth topic-word distribution. | 
| batch_size | split the corpus into the smaller batches (specified in
proportion) for distributed computing; it is disabled when a batch include
all the documents  | 
| model | a fitted LDA model; if provided,  | 
| verbose | logical; if  | 
Value
The same as textmodel_lda()
References
Du, Lan et al. (2012). "Sequential Latent Dirichlet Allocation". doi.org/10.1007/s10115-011-0425-1. Knowledge and Information Systems.
Watanabe, Kohei & Baturo, Alexander. (2023). "Seeded Sequential LDA: A Semi-supervised Algorithm for Topic-specific Analysis of Sentences". doi:10.1177/08944393231178605. Social Science Computer Review.
Examples
require(seededlda)
require(quanteda)
corp <- head(data_corpus_moviereviews, 500) %>%
    corpus_reshape()
toks <- tokens(corp, remove_punct = TRUE, remove_symbols = TRUE, remove_number = TRUE)
dfmt <- dfm(toks) %>%
    dfm_remove(stopwords("en"), min_nchar = 2) %>%
    dfm_trim(max_docfreq = 0.01, docfreq_type = "prop")
lda_seq <- textmodel_seqlda(dfmt, k = 6, max_iter = 500) # 6 topics
terms(lda_seq)
topics(lda_seq)