R: Vectorize a single-input read function to read multiple files

vectorize_reader {filecacher}

R Documentation

Vectorize a single-input read function to read multiple files

Description

The resulting vectorized read function still takes all the arguments of the original function.

Uses purrr::list_rbind() to bind the data frames, which generates a data frame with a superset of the columns from all the files, filling NA where data was not present.

Usage

vectorize_reader(read_fn, file_path_to = NULL)

Arguments

`read_fn`	The read function to vectorize. The first argument must be the files to read.
`file_path_to`	A string, which if provided, is the name of the column containing the file paths in the result. See 'names_to' in `purrr::list_rbind()`.

Value

A version of read_fn that can read multiple paths.

Examples

# Convert iris$Species to character to simplify comparison.
iris_chr <- iris
iris_chr$Species <- as.character(iris$Species)


# `iris` data frame separated into multiple subset files.
iris_files <- system.file("extdata", package = "filecacher") |>
  list.files(pattern = "_only[.]csv$", full.names = TRUE)

try(read.csv(iris_files))
vectorize_reader(read.csv)(
  iris_files,
  stringsAsFactors = TRUE
) |>
  all.equal(iris)


if (rlang::is_installed("arrow")) {
  try(arrow::read_csv_arrow(iris_files))
  vectorize_reader(arrow::read_csv_arrow)(
    iris_files
  ) |>
    as.data.frame() |>
    all.equal(iris_chr)
}


if (rlang::is_installed("data.table")) {
  try(data.table::fread(iris_files))
  vectorize_reader(data.table::fread)(
    iris_files,
    stringsAsFactors = TRUE
  ) |>
    as.data.frame() |>
    all.equal(iris)
}