redcap_read {REDCapR}R Documentation

Read records from a REDCap project in subsets, and stacks them together before returning a dataset

Description

From an external perspective, this function is similar to redcap_read_oneshot(). The internals differ in that redcap_read retrieves subsets of the data, and then combines them before returning (among other objects) a single base::data.frame(). This function can be more appropriate than redcap_read_oneshot() when returning large datasets that could tie up the server.

Usage

redcap_read(
  batch_size = 100L,
  interbatch_delay = 0.5,
  continue_on_error = FALSE,
  redcap_uri,
  token,
  records = NULL,
  records_collapsed = "",
  fields = NULL,
  fields_collapsed = "",
  forms = NULL,
  forms_collapsed = "",
  events = NULL,
  events_collapsed = "",
  raw_or_label = "raw",
  raw_or_label_headers = "raw",
  export_checkbox_label = FALSE,
  export_survey_fields = FALSE,
  export_data_access_groups = FALSE,
  filter_logic = "",
  datetime_range_begin = as.POSIXct(NA),
  datetime_range_end = as.POSIXct(NA),
  col_types = NULL,
  guess_type = TRUE,
  guess_max = NULL,
  http_response_encoding = "UTF-8",
  locale = readr::default_locale(),
  verbose = TRUE,
  config_options = NULL,
  id_position = 1L
)

Arguments

batch_size

The maximum number of subject records a single batch should contain. The default is 100.

interbatch_delay

The number of seconds the function will wait before requesting a new subset from REDCap. The default is 0.5 seconds.

continue_on_error

If an error occurs while reading, should records in subsequent batches be attempted. The default is FALSE, which prevents subsequent batches from running. Required.

redcap_uri

The URI (uniform resource identifier) of the REDCap project. Required.

token

The user-specific string that serves as the password for a project. Required.

records

An array, where each element corresponds to the ID of a desired record. Optional.

records_collapsed

A single string, where the desired ID values are separated by commas. Optional.

fields

An array, where each element corresponds to a desired project field. Optional.

fields_collapsed

A single string, where the desired field names are separated by commas. Optional.

forms

An array, where each element corresponds to a desired project form. Optional.

forms_collapsed

A single string, where the desired form names are separated by commas. Optional.

events

An array, where each element corresponds to a desired project event. Optional.

events_collapsed

A single string, where the desired event names are separated by commas. Optional.

raw_or_label

A string (either 'raw' or 'label' that specifies whether to export the raw coded values or the labels for the options of multiple choice fields. Default is 'raw'.

raw_or_label_headers

A string (either 'raw' or 'label' that specifies for the CSV headers whether to export the variable/field names (raw) or the field labels (label). Default is 'raw'.

export_checkbox_label

specifies the format of checkbox field values specifically when exporting the data as labels. If raw_or_label is 'label' and export_checkbox_label is TRUE, the values will be the text displayed to the users. Otherwise, the values will be 0/1.

export_survey_fields

A boolean that specifies whether to export the survey identifier field (e.g., 'redcap_survey_identifier') or survey timestamp fields (e.g., instrument+'_timestamp'). The timestamp outputs reflect the survey's completion time (according to the time and timezone of the REDCap server.)

export_data_access_groups

A boolean value that specifies whether or not to export the redcap_data_access_group field when data access groups are utilized in the project. Default is FALSE. See the details below.

filter_logic

String of logic text (e.g., ⁠[gender] = 'male'⁠) for filtering the data to be returned by this API method, in which the API will only return the records (or record-events, if a longitudinal project) where the logic evaluates as TRUE. An blank/empty string returns all records.

datetime_range_begin

To return only records that have been created or modified after a given datetime, provide a POSIXct value. If not specified, REDCap will assume no begin time.

datetime_range_end

To return only records that have been created or modified before a given datetime, provide a POSIXct value. If not specified, REDCap will assume no end time.

col_types

A readr::cols() object passed internally to readr::read_csv(). Optional.

guess_type

A boolean value indicating if all columns should be returned as character. If true, readr::read_csv() guesses the intended data type for each column.

guess_max

Deprecated.

http_response_encoding

The encoding value passed to httr::content(). Defaults to 'UTF-8'.

locale

a readr::locale() object to specify preferences like number, date, and time formats. This object is passed to readr::read_csv(). Defaults to readr::default_locale().

verbose

A boolean value indicating if messages should be printed to the R console during the operation. The verbose output might contain sensitive information (e.g. PHI), so turn this off if the output might be visible somewhere public. Optional.

config_options

A list of options to pass to POST method in the httr package. See the details in redcap_read_oneshot() Optional.

id_position

The column position of the variable that unique identifies the subject (typically record_id). This defaults to the first variable in the dataset.

Details

redcap_read() internally uses multiple calls to redcap_read_oneshot() to select and return data. Initially, only the primary key is queried through the REDCap API. The long list is then subsetted into batches, whose sizes are determined by the batch_size parameter. REDCap is then queried for all variables of the subset's subjects. This is repeated for each subset, before returning a unified base::data.frame().

The function allows a delay between calls, which allows the server to attend to other users' requests (such as the users entering data in a browser). In other words, a delay between batches does not bog down the webserver when exporting/importing a large dataset.

A second benefit is less RAM is required on the webserver. Because each batch is smaller than the entire dataset, the webserver tackles more manageably sized objects in memory. Consider batching if you encounter the error:

ERROR: REDCap ran out of server memory. The request cannot be processed.
Please try importing/exporting a smaller amount of data.

For redcap_read() to function properly, the user must have Export permissions for the 'Full Data Set'. Users with only 'De-Identified' export privileges can still use redcap_read_oneshot. To grant the appropriate permissions:

Value

Currently, a list is returned with the following elements:

Author(s)

Will Beasley

References

The official documentation can be found on the 'API Help Page' and 'API Examples' pages on the REDCap wiki (i.e., https://community.projectredcap.org/articles/456/api-documentation.html and https://community.projectredcap.org/articles/462/api-examples.html). If you do not have an account for the wiki, please ask your campus REDCap administrator to send you the static material.

Examples

## Not run: 
uri     <- "https://bbmc.ouhsc.edu/redcap/api/"
token   <- "9A81268476645C4E5F03428B8AC3AA7B"
REDCapR::redcap_read(batch_size=2, redcap_uri=uri, token=token)$data

# Specify the column types.
col_types <- readr::cols(
  record_id  = readr::col_integer(),
  race___1   = readr::col_logical(),
  race___2   = readr::col_logical(),
  race___3   = readr::col_logical(),
  race___4   = readr::col_logical(),
  race___5   = readr::col_logical(),
  race___6   = readr::col_logical()
)
REDCapR::redcap_read(
  redcap_uri = uri,
  token      = token,
  col_types  = col_types,
  batch_size = 2
)$data


## End(Not run)

[Package REDCapR version 1.1.0 Index]