dqa {DQAstats} | R Documentation |
Perform Data Quality Assessment of Electronic Health Records.
Description
This function performs a data quality assessment (DQA) of electronic health records (EHR).#'
Usage
dqa(
source_system_name,
target_system_name,
utils_path,
mdr_filename = "mdr.csv",
output_dir = paste0(tempdir(), "/output/"),
logfile_dir = tempdir(),
parallel = FALSE,
ncores = 2,
restricting_date_start = NULL,
restricting_date_end = NULL,
restricting_date_format = NULL
)
Arguments
source_system_name |
A character string. The name of the source-system, e.g. "P21" or "i2b2". This name must be identical and unique to one entry in the settings-yml file. |
target_system_name |
Optional. A character string or null. The name of the target-system, e.g. "P21" or "i2b2". This name must be identical and unique to one entry in the config-yml file or null. If the argument is empty, the source will be processed as standalone on its own. |
utils_path |
A character string. The path to the utils-folder, containing the required app utilities like the MDR and the settings folder. |
mdr_filename |
A character string. The filename of the MDR e.g. "mdr_example_data.csv". |
output_dir |
The path to the output folder where all the results will
be stored (default: |
logfile_dir |
The absolute path to folder where the logfile
will be stored default( |
parallel |
A boolean. If TRUE, initializing a |
ncores |
A integer. The number of cores to use. Caution: you would probably like to choose a low number when operating on large datasets. Default: 2. |
restricting_date_start |
The date as the lower limit against which
the data to be analyzed will be filtered. Your input must be able to be
recognized as a date by |
restricting_date_end |
The date as the lower limit against which
the data to be analyzed will be filtered. Your input must be able to be
recognized as a date by |
restricting_date_format |
The format in which the input data is stored.
See |
Value
This function is a wrapper around all helper functions in DQAstats
to perform the data quality assessment. The results are summarized in a
PDF report which is saved to outdir
. The return value of this function is
a nested list that contains all results as R objects.
Examples
# runtime > 5 sec.
Sys.setenv("EXAMPLECSV_SOURCE_PATH" = system.file(
"demo_data",
package = "DQAstats")
)
Sys.setenv("EXAMPLECSV_TARGET_PATH" = system.file(
"demo_data",
package = "DQAstats")
)
# Set path to utilities folder where to find the mdr and template files:
utils_path <- system.file(
"demo_data/utilities",
package = "DQAstats"
)
# Execute the DQA and generate a PDF report:
results <- DQAstats::dqa(
source_system_name = "exampleCSV_source",
target_system_name = "exampleCSV_target",
utils_path = utils_path,
mdr_filename = "mdr_example_data.csv",
output_dir = paste0(tempdir(), "/output/"),
parallel = FALSE
)