augment.tidylda {tidylda} | R Documentation |
Augment method for tidylda
objects
Description
augment
appends observation level model outputs.
Usage
## S3 method for class 'tidylda'
augment(
x,
data,
type = c("class", "prob"),
document_col = "document",
term_col = "term",
...
)
Arguments
x |
an object of class |
data |
a tidy tibble containing one row per original document-token pair, such as is returned by tdm_tidiers with column names c("document", "term") at a minimum. |
type |
one of either "class" or "prob" |
document_col |
character specifying the name of the column that
corresponds to document IDs. Defaults to |
term_col |
character specifying the name of the column that
corresponds to term/token IDs. Defaults to |
... |
other arguments passed to methods,currently not used |
Details
The key statistic for augment
is P(topic | document, token) =
P(topic | token) * P(token | document). P(topic | token) are the entries
of the 'lambda' matrix in the tidylda
object passed
with x
. P(token | document) is taken to be the frequency of each
token normalized within each document.
Value
augment
returns a tidy tibble containing one row per document-token
pair, with one or more columns appended, depending on the value of type
.
If type = 'prob'
, then one column per topic is appended. Its value
is P(topic | document, token).
If type = 'class'
, then the most-probable topic for each document-token
pair is returned. If multiple topics are equally probable, then the topic
with the smallest index is returned by default.