R: Sequential Latent Dirichlet allocation

textmodel_seqlda {seededlda}

R Documentation

Sequential Latent Dirichlet allocation

Description

Implements Sequential Latent Dirichlet allocation (Sequential LDA). textmodel_seqlda() allows the users to classify sentences of texts. It considers the topics of previous document in inferring the topics of currency document. textmodel_seqlda() is a shortcut equivalent to textmodel_lda(gamma = 0.5). Seeded Sequential LDA is textmodel_seededlda(gamma = 0.5).

Usage

textmodel_seqlda(
  x,
  k = 10,
  max_iter = 2000,
  auto_iter = FALSE,
  alpha = 0.5,
  beta = 0.1,
  batch_size = 1,
  model = NULL,
  verbose = quanteda_options("verbose")
)

Arguments

`x`	the dfm on which the model will be fit.
`k`	the number of topics.
`max_iter`	the maximum number of iteration in Gibbs sampling.
`auto_iter`	if `TRUE`, stops Gibbs sampling on convergence before reaching `max_iter`. See details.
`alpha`	the values to smooth topic-document distribution.
`beta`	the values to smooth topic-word distribution.
`batch_size`	split the corpus into the smaller batches (specified in proportion) for distributed computing; it is disabled when a batch include all the documents `batch_size = 1.0`. See details.
`model`	a fitted LDA model; if provided, `textmodel_lda()` inherits parameters from an existing model. See details.
`verbose`	logical; if `TRUE` print diagnostic information during fitting.

Value

The same as textmodel_lda()

References

Du, Lan et al. (2012). "Sequential Latent Dirichlet Allocation". doi.org/10.1007/s10115-011-0425-1. Knowledge and Information Systems.

Watanabe, Kohei & Baturo, Alexander. (2023). "Seeded Sequential LDA: A Semi-supervised Algorithm for Topic-specific Analysis of Sentences". doi:10.1177/08944393231178605. Social Science Computer Review.

Examples


require(seededlda)
require(quanteda)

corp <- head(data_corpus_moviereviews, 500) %>%
    corpus_reshape()
toks <- tokens(corp, remove_punct = TRUE, remove_symbols = TRUE, remove_number = TRUE)
dfmt <- dfm(toks) %>%
    dfm_remove(stopwords("en"), min_nchar = 2) %>%
    dfm_trim(max_docfreq = 0.01, docfreq_type = "prop")

lda_seq <- textmodel_seqlda(dfmt, k = 6, max_iter = 500) # 6 topics
terms(lda_seq)
topics(lda_seq)

[Package seededlda version 1.3.2 Index]