textmodel_seqlda {seededlda}R Documentation

Sequential Latent Dirichlet allocation

Description

Implements Sequential Latent Dirichlet allocation (Sequential LDA). textmodel_seqlda() allows the users to classify sentences of texts. It considers the topics of previous document in inferring the topics of currency document. textmodel_seqlda() is a shortcut equivalent to textmodel_lda(gamma = 0.5). Seeded Sequential LDA is textmodel_seededlda(gamma = 0.5).

Usage

textmodel_seqlda(
  x,
  k = 10,
  max_iter = 2000,
  auto_iter = FALSE,
  alpha = 0.5,
  beta = 0.1,
  batch_size = 1,
  model = NULL,
  verbose = quanteda_options("verbose")
)

Arguments

x

the dfm on which the model will be fit.

k

the number of topics.

max_iter

the maximum number of iteration in Gibbs sampling.

auto_iter

if TRUE, stops Gibbs sampling on convergence before reaching max_iter. See details.

alpha

the values to smooth topic-document distribution.

beta

the values to smooth topic-word distribution.

batch_size

split the corpus into the smaller batches (specified in proportion) for distributed computing; it is disabled when a batch include all the documents batch_size = 1.0. See details.

model

a fitted LDA model; if provided, textmodel_lda() inherits parameters from an existing model. See details.

verbose

logical; if TRUE print diagnostic information during fitting.

Value

The same as textmodel_lda()

References

Du, Lan et al. (2012). "Sequential Latent Dirichlet Allocation". doi.org/10.1007/s10115-011-0425-1. Knowledge and Information Systems.

Watanabe, Kohei & Baturo, Alexander. (2023). "Seeded Sequential LDA: A Semi-supervised Algorithm for Topic-specific Analysis of Sentences". doi:10.1177/08944393231178605. Social Science Computer Review.

Examples


require(seededlda)
require(quanteda)

corp <- head(data_corpus_moviereviews, 500) %>%
    corpus_reshape()
toks <- tokens(corp, remove_punct = TRUE, remove_symbols = TRUE, remove_number = TRUE)
dfmt <- dfm(toks) %>%
    dfm_remove(stopwords("en"), min_nchar = 2) %>%
    dfm_trim(max_docfreq = 0.01, docfreq_type = "prop")

lda_seq <- textmodel_seqlda(dfmt, k = 6, max_iter = 500) # 6 topics
terms(lda_seq)
topics(lda_seq)


[Package seededlda version 1.3.2 Index]