keyATM_read {keyATM} | R Documentation |
Read texts
Description
Read texts and create a keyATM_docs
object, which is a list of texts.
Usage
keyATM_read(
texts,
encoding = "UTF-8",
check = TRUE,
keep_docnames = FALSE,
split = 0
)
Arguments
texts |
input. keyATM takes a quanteda dfm (dgCMatrix), data.frame, tibble tbl_df, or a vector of file paths. |
encoding |
character. Only used when |
check |
logical. If |
keep_docnames |
logical. If |
split |
numeric. This option works only with a quanteda dfm. It creates a two subset of the dfm by randomly splitting each document (i.e., the total number of documents is the same between two subsets). This option specifies the split proportion. Default is |
Value
a keyATM_docs object. The first element is a list whose elements are split texts. The length of the list equals to the number of documents.
Examples
## Not run:
# Use quanteda dfm
keyATM_docs <- keyATM_read(texts = quanteda_dfm)
# Use data.frame or tibble (texts should be stored in a column named `text`)
keyATM_docs <- keyATM_read(texts = data_frame_object)
keyATM_docs <- keyATM_read(texts = tibble_object)
# Use a vector that stores full paths to the text files
files <- list.files(doc_folder, pattern = "*.txt", full.names = TRUE)
keyATM_docs <- keyATM_read(texts = files)
## End(Not run)