as.tokens.dfm {sentopics}R Documentation

Convert back a dfm to a tokens object

Description

Convert back a dfm to a tokens object

Usage

## S3 method for class 'dfm'
as.tokens(
  x,
  concatenator = NULL,
  tokens = NULL,
  ignore_list = NULL,
  case_insensitive = FALSE,
  padding = TRUE,
  ...
)

Arguments

x

quanteda::dfm to be coerced

concatenator

only used for consistency with the generic

tokens

optionally, the tokens from which the dfm was created. Providing the initial tokens will ensure that the word order will be respected in the coerced object.

ignore_list

a character vector of words that should not be removed from the initial tokens object. Useful to avoid removing some lexicon word following the usage of quanteda::dfm_trim().

case_insensitive

only used when the tokens argument is provided. Default to FALSE. This function removes words in the initial tokens based on the remaining features in the dfm object. This check is case-sensitive by default, and can be relaxed by setting this argument to TRUE.

padding

if TRUE, leaves an empty string where the removed tokens previously existed. The use of padding is encouraged to improve the behavior of the coherence metrics (see coherence()) that rely on word positions.

...

unused

Value

a quanteda quanteda::tokens object.

See Also

quanteda::as.tokens() quanteda::dfm()

Examples

library("quanteda")
dfm <- dfm(ECB_press_conferences_tokens, tolower = FALSE)
dfm <- dfm_trim(dfm, min_termfreq = 200)
as.tokens(dfm)
as.tokens(dfm, tokens = ECB_press_conferences_tokens)
as.tokens(dfm, tokens = ECB_press_conferences_tokens, padding = FALSE)

[Package sentopics version 0.7.3 Index]