R: Augment method for 'tidylda' objects

augment.tidylda {tidylda}

R Documentation

Augment method for `tidylda` objects

Description

augment appends observation level model outputs.

Usage

## S3 method for class 'tidylda'
augment(
  x,
  data,
  type = c("class", "prob"),
  document_col = "document",
  term_col = "term",
  ...
)

Arguments

`x`	an object of class `tidylda`
`data`	a tidy tibble containing one row per original document-token pair, such as is returned by tdm_tidiers with column names c("document", "term") at a minimum.
`type`	one of either "class" or "prob"
`document_col`	character specifying the name of the column that corresponds to document IDs. Defaults to `"document"`.
`term_col`	character specifying the name of the column that corresponds to term/token IDs. Defaults to `"term"`.
`...`	other arguments passed to methods,currently not used

Details

The key statistic for augment is P(topic | document, token) = P(topic | token) * P(token | document). P(topic | token) are the entries of the 'lambda' matrix in the tidylda object passed with x. P(token | document) is taken to be the frequency of each token normalized within each document.

Value

augment returns a tidy tibble containing one row per document-token pair, with one or more columns appended, depending on the value of type.

If type = 'prob', then one column per topic is appended. Its value is P(topic | document, token).

If type = 'class', then the most-probable topic for each document-token pair is returned. If multiple topics are equally probable, then the topic with the smallest index is returned by default.