dqa {DQAstats}R Documentation

Perform Data Quality Assessment of Electronic Health Records.


This function performs a data quality assessment (DQA) of electronic health records (EHR).#'


  mdr_filename = "mdr.csv",
  output_dir = paste0(tempdir(), "/output/"),
  logfile_dir = tempdir(),
  parallel = FALSE,
  ncores = 2,
  restricting_date_start = NULL,
  restricting_date_end = NULL,
  restricting_date_format = NULL



A character string. The name of the source-system, e.g. "P21" or "i2b2". This name must be identical and unique to one entry in the settings-yml file.


Optional. A character string or null. The name of the target-system, e.g. "P21" or "i2b2". This name must be identical and unique to one entry in the config-yml file or null. If the argument is empty, the source will be processed as standalone on its own.


A character string. The path to the utils-folder, containing the required app utilities like the MDR and the settings folder.


A character string. The filename of the MDR e.g. "mdr_example_data.csv".


The path to the output folder where all the results will be stored (default: paste0(tempdir(), "/output/")).


The absolute path to folder where the logfile will be stored default(tempdir()).


A boolean. If TRUE, initializing a future::plan() for running the code (default: FALSE).


A integer. The number of cores to use. Caution: you would probably like to choose a low number when operating on large datasets. Default: 2.


The date as the lower limit against which the data to be analyzed will be filtered. Your input must be able to be recognized as a date by parsedate::parse_date("2021-02-25"). Keep in mind: If you supply a date without a time here, the time will automatically be set to 00:00.


The date as the lower limit against which the data to be analyzed will be filtered. Your input must be able to be recognized as a date by parsedate::parse_date("2021-02-25") Keep in mind: If you supply a date without a time here, the time will automatically be set to 00:00. This means, the end DAY you provide here won't be included: '2021-12-31' will become '2021-12-31 00:00:00'. If you want to include this day, you need to supply also a time '2021-12-31 23:59:59' or just use the next day without a time: '2022-01-01'.


The format in which the input data is stored. See ?strptime for possible parameters. Currently not implemented! So there is no effect if you pass a format here.


This function is a wrapper around all helper functions in DQAstats to perform the data quality assessment. The results are summarized in a PDF report which is saved to outdir. The return value of this function is a nested list that contains all results as R objects.


# runtime > 5 sec.
Sys.setenv("EXAMPLECSV_SOURCE_PATH" = system.file(
  package = "DQAstats")
Sys.setenv("EXAMPLECSV_TARGET_PATH" = system.file(
  package = "DQAstats")

# Set path to utilities folder where to find the mdr and template files:
utils_path <- system.file(
  package = "DQAstats"

# Execute the DQA and generate a PDF report:
results <- DQAstats::dqa(
  source_system_name = "exampleCSV_source",
  target_system_name = "exampleCSV_target",
  utils_path = utils_path,
  mdr_filename = "mdr_example_data.csv",
  output_dir = paste0(tempdir(), "/output/"),
  parallel = FALSE

