R: Read and Merge Files from Directory

read_dir {neatStats}

R Documentation

Read and Merge Files from Directory

Description

Reads data files from any given directory as data frames and merges them into a single data frame (using data.table::rbindlist).

Usage

read_dir(
  pattern = "*[.]",
  path = ".",
  reader_function = data.table::fread,
  ...,
  subdirs = FALSE,
  filt = NULL,
  hush = FALSE
)

Arguments

`pattern`	Regular expression ("regex"; as string or `NULL`) for selecting files (passed to the `list.files` function). The default `NULL` means that all files at the specified path will be read in. To select, for example, a specific extension like ".txt", the pattern can be given as `"\.txt$"` (for CSV files, `"\.csv$"`, etc.). Files ending with e.g. "group2.txt" can be specified as `"group2\.txt$"`. Files starting with "exp3" can be specified as `"^exp3"`. Files starting with "exp3" AND ending with ".txt" extension can be specified as `"^exp3.*\.txt$"`. To read in a single file, specify the full filename (e.g. `"exp3_subject46_group2.txt"`). (See `?regex` for more details.)
`path`	Path to the directory from which the files should be selected and read. The default `"."` means the current working directory (as returned by `getwd()`). Either specify correct working directory in advance (see `setwd`, `path_neat`), or otherwise enter relative or full paths (e.g. `"C:/research"` or `"/home/projects"`, etc.).
`reader_function`	A function to be used for reading the files, `data.table::fread` by default.
`...`	Any arguments to be passed on to the chosen `reader_function`.
`subdirs`	Logical (`FALSE` by default). If `TRUE`, searches files in subdirectories as well (relative to the given `path`).
`filt`	An expression to filter, by column values, each data file after it is read and before it is merged with the other data. (The expression should use column names alone; see Examples.)
`hush`	Logical. If `FALSE` (default), prints lists all data file names as they are being read (along with related warnings).

Note

This function is very similar to the readbulk::read_bulk function. One important difference however is the data.table use, which greatly speeds up the process. Another important difference is the possibility of file selection based on any regex pattern. Furthermore, this function allows pre-filtering by file (see filt). Data files could include significant amount of unnecessary data, and filtering prevents these to be merged.

Examples



# first, set current working directory
# e.g. to script's path with setwd(path_neat())

# read all text files in currect working directory
merged_df = read_dir("\\.txt$")
# merged_df now has all data

# to use utils::read.table for reading (slower than fread)
# (with some advisable options passed to it)
merged_df = read_dir(
    '\\.txt$',
    reader_function = read.table,
    header = TRUE,
    fill = TRUE,
    quote = "\"",
    stringsAsFactors = FALSE
)