R: Read texts

keyATM_read {keyATM}

R Documentation

Read texts

Description

Read texts and create a keyATM_docs object, which is a list of texts.

Usage

keyATM_read(
  texts,
  encoding = "UTF-8",
  check = TRUE,
  keep_docnames = FALSE,
  split = 0
)

Arguments

`texts`	input. keyATM takes a quanteda dfm (dgCMatrix), data.frame, tibble tbl_df, or a vector of file paths.
`encoding`	character. Only used when `texts` is a vector of file paths. Default is `UTF-8`.
`check`	logical. If `TRUE`, check whether there is anything wrong with the structure of texts. Default is `TRUE`.
`keep_docnames`	logical. If `TRUE`, it keeps the document names in a quanteda dfm. Default is `FALSE`.
`split`	numeric. This option works only with a quanteda dfm. It creates a two subset of the dfm by randomly splitting each document (i.e., the total number of documents is the same between two subsets). This option specifies the split proportion. Default is `0`.

Value

a keyATM_docs object. The first element is a list whose elements are split texts. The length of the list equals to the number of documents.

Examples

## Not run: 
 # Use quanteda dfm
 keyATM_docs <- keyATM_read(texts = quanteda_dfm)

 # Use data.frame or tibble (texts should be stored in a column named `text`)
 keyATM_docs <- keyATM_read(texts = data_frame_object)
 keyATM_docs <- keyATM_read(texts = tibble_object)

 # Use a vector that stores full paths to the text files
 files <- list.files(doc_folder, pattern = "*.txt", full.names = TRUE)
 keyATM_docs <- keyATM_read(texts = files)


## End(Not run)

[Package keyATM version 0.5.2 Index]