daiquiri_report {daiquiri} | R Documentation |
Create a data quality report from a data frame
Description
Accepts record-level data from a data frame, validates it against the expected type of content of each column, generates a collection of time series plots for visual inspection, and saves a report to disk.
Usage
daiquiri_report(
df,
field_types,
override_column_names = FALSE,
na = c("", "NA", "NULL"),
dataset_description = NULL,
aggregation_timeunit = "day",
report_title = "daiquiri data quality report",
save_directory = ".",
save_filename = NULL,
show_progress = TRUE,
log_directory = NULL
)
Arguments
df |
A data frame. Rectangular data can be read from file using
|
field_types |
|
override_column_names |
If |
na |
vector containing strings that should be interpreted as missing
values, Default = |
dataset_description |
Short description of the dataset being checked. This will appear on the report. If blank, the name of the data frame object will be used |
aggregation_timeunit |
Unit of time to aggregate over. Specify one of
|
report_title |
Title to appear on the report |
save_directory |
String specifying directory in which to save the report. Default is current directory. |
save_filename |
String specifying filename for the report, excluding any
file extension. If no filename is supplied, one will be automatically
generated with the format |
show_progress |
Print progress to console. Default = |
log_directory |
String specifying directory in which to save log file. If no directory is supplied, progress is not logged. |
Value
A list containing information relating to the supplied parameters as
well as the resulting daiquiri_source_data
and daiquiri_aggregated_data
objects.
Details
In order for the package to detect any non-conformant
values in numeric or datetime fields, these should be present in the data
frame in their raw character format. Rectangular data from a text file will
automatically be read in as character type if you use the read_data()
function. Data frame columns that are not of class character will still be
processed according to the field_types
specified.
See Also
read_data()
, field_types()
,
field_types_available()
Examples
# load example data into a data.frame
raw_data <- read_data(
system.file("extdata", "example_prescriptions.csv", package = "daiquiri"),
delim = ",",
col_names = TRUE
)
# create a report in the current directory
daiq_obj <- daiquiri_report(
raw_data,
field_types = field_types(
PrescriptionID = ft_uniqueidentifier(),
PrescriptionDate = ft_timepoint(),
AdmissionDate = ft_datetime(includes_time = FALSE, na = "1800-01-01"),
Drug = ft_freetext(),
Dose = ft_numeric(),
DoseUnit = ft_categorical(),
PatientID = ft_ignore(),
Location = ft_categorical(aggregate_by_each_category = TRUE)
),
override_column_names = FALSE,
na = c("", "NULL"),
dataset_description = "Example data provided with package",
aggregation_timeunit = "day",
report_title = "daiquiri data quality report",
save_directory = ".",
save_filename = "example_data_report",
show_progress = TRUE,
log_directory = NULL
)