| corpus_files {tm.plugin.koRpus} | R Documentation | 
Get a comprehensive data frame describing the files of your corpus
Description
The function translates the hierarchy defintion given into a data frame with one row for each file, including the generated document ID.
Usage
corpus_files(
  dir,
  hierarchy = list(),
  fsep = .Platform$file.sep,
  full_list = FALSE
)
Arguments
| dir | File path to the root directory of the text corpus, or a TIF[1] compliant data frame. | 
| hierarchy | A named list of named character vectors describing the directory hierarchy level by level.
If  | 
| fsep | Character string defining the path separator to use. | 
| full_list | Logical, see return value. | 
Value
Either a data frame with columns doc_id, file,
path and one further factor
column for each hierarchy level,
or (if full_list=TRUE) a list containing that data frame
(all_files) and also data frames describing the hierarchy by given names (hier_names),
directories (hier_dirs) and relative paths (hier_paths).
References
[1] Text Interchange Formats (https://github.com/ropensci/tif)
Examples
myCorpusFiles <- corpus_files(
  dir=file.path(
    path.package("tm.plugin.koRpus"), "examples", "corpus"
  ),
  hierarchy=list(
    Topic=c(
      Winner="Reality Winner",
      Edwards="Natalie Edwards"
    ),
    Source=c(
      Wikipedia_prev="Wikipedia (old)",
      Wikipedia_new="Wikipedia (new)"
    )
  )
)