terms.BTM {BTM} | R Documentation |
Get highest token probabilities for each topic or get biterms used in the model
## S3 method for class 'BTM' terms(x, type = c("tokens", "biterms"), threshold = 0, top_n = 5, ...)
x |
an object of class BTM as returned by |
type |
a character string, either 'tokens' or 'biterms'. Defaults to 'tokens'. |
threshold |
threshold in 0-1 range. Only the terms which are more likely than the threshold are returned for each topic. Only used in case type = 'tokens'. |
top_n |
integer indicating to return the top n tokens for each topic only. Only used in case type = 'tokens'. |
... |
not used |
Depending if type is set to 'tokens' or 'biterms' the following is returned:
If type='tokens'
: Get the probability of the token given the topic P(w|z).
It returns a list of data.frames (one for each topic) where each data.frame contains columns token and probability ordered from high to low.
The list is the same length as the number of topics.
If type='biterms'
: a list containing 2 elements:
n
which indicates the number of biterms used to train the model
biterms
which is a data.frame with columns term1, term2 and topic,
indicating for all biterms found in the data the topic to which the biterm is assigned to
Note that a biterm is unordered, in the output of type='biterms'
term1 is always smaller than or equal to term2.
library(udpipe) data("brussels_reviews_anno", package = "udpipe") x <- subset(brussels_reviews_anno, language == "nl") x <- subset(x, xpos %in% c("NN", "NNP", "NNS")) x <- x[, c("doc_id", "lemma")] model <- BTM(x, k = 5, iter = 5, trace = TRUE) terms(model) terms(model, top_n = 10) terms(model, threshold = 0.01, top_n = +Inf) bi <- terms(model, type = "biterms") str(bi)