terms.data.frame {BTM}R Documentation

Get the set of Biterms from a tokenised data frame

Description

This extracts words occurring in the neighbourhood of one another, within a certain window range. The default setting provides the biterms used when fitting BTM with the default window parameter.

Usage

## S3 method for class 'data.frame'
terms(x, type = c("tokens", "biterms"), window = 15, ...)

Arguments

x

a tokenised data frame containing one row per token with 2 columns

  • the first column is a context identifier (e.g. a tweet id, a document id, a sentence id, an identifier of a survey answer, an identifier of a part of a text)

  • the second column is a column called of type character containing the sequence of words occurring within the context identifier

type

a character string, either 'tokens' or 'biterms'. Defaults to 'tokens'.

window

integer with the window size for biterm extraction. Defaults to 15.

...

not used

Value

Depending if type is set to 'tokens' or 'biterms' the following is returned:

Note

If x is a data.frame which has an attribute called 'terms', it just returns that 'terms' attribute

See Also

BTM, predict.BTM, logLik.BTM

Examples


library(udpipe)
data("brussels_reviews_anno", package = "udpipe")
x <- subset(brussels_reviews_anno, language == "nl")
x <- subset(x, xpos %in% c("NN", "NNP", "NNS"))
x <- x[, c("doc_id", "lemma")]
biterms <- terms(x, window = 15, type = "biterms")
str(biterms)
tokens <- terms(x, type = "tokens")
str(tokens)


[Package BTM version 0.3.7 Index]