dqa {DQAstats}R Documentation

Perform Data Quality Assessment of Electronic Health Records.

Description

This function performs a data quality assessment (DQA) of electronic health records (EHR).#'

Usage

dqa(
  source_system_name,
  target_system_name,
  utils_path,
  mdr_filename = "mdr.csv",
  output_dir = paste0(tempdir(), "/output/"),
  logfile_dir = tempdir(),
  parallel = FALSE,
  ncores = 2,
  restricting_date_start = NULL,
  restricting_date_end = NULL,
  restricting_date_format = NULL
)

Arguments

source_system_name

A character string. The name of the source-system, e.g. "P21" or "i2b2". This name must be identical and unique to one entry in the settings-yml file.

target_system_name

Optional. A character string or null. The name of the target-system, e.g. "P21" or "i2b2". This name must be identical and unique to one entry in the config-yml file or null. If the argument is empty, the source will be processed as standalone on its own.

utils_path

A character string. The path to the utils-folder, containing the required app utilities like the MDR and the settings folder.

mdr_filename

A character string. The filename of the MDR e.g. "mdr_example_data.csv".

output_dir

The path to the output folder where all the results will be stored (default: paste0(tempdir(), "/output/")).

logfile_dir

The absolute path to folder where the logfile will be stored default(tempdir()).

parallel

A boolean. If TRUE, initializing a future::plan() for running the code (default: FALSE).

ncores

A integer. The number of cores to use. Caution: you would probably like to choose a low number when operating on large datasets. Default: 2.

restricting_date_start

The date as the lower limit against which the data to be analyzed will be filtered. Your input must be able to be recognized as a date by parsedate::parse_date("2021-02-25"). Keep in mind: If you supply a date without a time here, the time will automatically be set to 00:00.

restricting_date_end

The date as the lower limit against which the data to be analyzed will be filtered. Your input must be able to be recognized as a date by parsedate::parse_date("2021-02-25") Keep in mind: If you supply a date without a time here, the time will automatically be set to 00:00. This means, the end DAY you provide here won't be included: '2021-12-31' will become '2021-12-31 00:00:00'. If you want to include this day, you need to supply also a time '2021-12-31 23:59:59' or just use the next day without a time: '2022-01-01'.

restricting_date_format

The format in which the input data is stored. See ?strptime for possible parameters. Currently not implemented! So there is no effect if you pass a format here.

Value

This function is a wrapper around all helper functions in DQAstats to perform the data quality assessment. The results are summarized in a PDF report which is saved to outdir. The return value of this function is a nested list that contains all results as R objects.

Examples

# runtime > 5 sec.
Sys.setenv("EXAMPLECSV_SOURCE_PATH" = system.file(
  "demo_data",
  package = "DQAstats")
)
Sys.setenv("EXAMPLECSV_TARGET_PATH" = system.file(
  "demo_data",
  package = "DQAstats")
)

# Set path to utilities folder where to find the mdr and template files:
utils_path <- system.file(
  "demo_data/utilities",
  package = "DQAstats"
)

# Execute the DQA and generate a PDF report:
results <- DQAstats::dqa(
  source_system_name = "exampleCSV_source",
  target_system_name = "exampleCSV_target",
  utils_path = utils_path,
  mdr_filename = "mdr_example_data.csv",
  output_dir = paste0(tempdir(), "/output/"),
  parallel = FALSE
)


[Package DQAstats version 0.3.5 Index]