cached_read {filecacher}R Documentation

Read files via cache of file list and contents

Description

Reads data and save to a local file for easier management and re-reading.

By default, also saves the file info to determine whether the cache is valid, or whether the contents need to be updated because the files have been modified. To skip this, or force reading from scratch, set skip_file_info=TRUE or force=TRUE, respectively.

If updating is called for, all the files are re-read.

cached_read_csv() is a convenience function using a csv read function based on read_type.

Usage

cached_read(
  files,
  label,
  read_fn,
  cache = NULL,
  type = NULL,
  force = FALSE,
  skip_file_info = FALSE
)

cached_read_csv(
  files,
  label,
  read_type = NULL,
  cache = NULL,
  type = NULL,
  skip_file_info = FALSE,
  force = FALSE
)

Arguments

files

A file or files to read with read_fn.

label

A string to use as the name of the file to cache.

read_fn

A function which takes file(s) as its first parameter and reads them. To use a single-input read function such as read.csv() with multiple files, use vectorize_reader(), e.g. read_fn = vectorize_reader(read.csv).

cache

One of the following:

  • The path to an existing directory to use for caching. If NULL (default) uses the current path, using here::here().

  • An existing cache object as generated by file_cache().

type

A string describing the type of cache. Must be NULL or one of 'rds', 'parquet', or 'csv'. If NULL (default), uses 'rds'.

force

If TRUE, forces evaluation even if the cache exists.

skip_file_info

Whether to skip saving and/or checking the file info. Use this when just querying the file system (without opening files) is slow.

read_type

Type of csv read function to use. One of:

  • "readr": readr::read_csv()

  • "arrow": vectorize_reader(arrow::read_csv_arrow)()

  • "data.table": vectorize_reader(data.table::fread)()

  • "base": vectorize_reader(utils::read.csv)()

  • NULL (default): uses the first installed.

Value

The result of read_fn(files).

See Also

vectorize_reader() to convert a single-input read function into a multiple-input function.

Examples

# Create a temporary directory for the cache.
tf <- tempfile()
dir.create(tf)

# A function that logs when it's called.
read_csv_log <- function(files) {
  message("Reading from file ...")
  return(vectorize_reader(read.csv)(files, stringsAsFactors = TRUE))
}

# `iris` data frame separated into multiple subset files.
iris_files <- system.file("extdata", package = "filecacher") |>
  list.files(pattern = "_only[.]csv$", full.names = TRUE)

# 1) First time, the message is shown.
iris_files |>
  cached_read("mtcars", read_csv_log, cache = tf) |>
  all.equal(iris)

# 2) Second time, no message is shown since the data is pulled from cache.
iris_files |>
  cached_read("mtcars", read_csv_log, cache = tf) |>
  all.equal(iris)

# 3) If desired, reloading can be forced using `force = TRUE`.
iris_files |>
  cached_read("mtcars", read_csv_log, cache = tf, force = TRUE) |>
  all.equal(iris)


unlink(tf, recursive = TRUE)

[Package filecacher version 0.2.9 Index]