daiquiri_report {daiquiri}R Documentation

Create a data quality report from a data frame

Description

Accepts record-level data from a data frame, validates it against the expected type of content of each column, generates a collection of time series plots for visual inspection, and saves a report to disk.

Usage

daiquiri_report(
  df,
  field_types,
  override_column_names = FALSE,
  na = c("", "NA", "NULL"),
  dataset_description = NULL,
  aggregation_timeunit = "day",
  report_title = "daiquiri data quality report",
  save_directory = ".",
  save_filename = NULL,
  show_progress = TRUE,
  log_directory = NULL
)

Arguments

df

A data frame. Rectangular data can be read from file using read_data(). See Details.

field_types

field_types() object specifying names and types of fields (columns) in the supplied df. See also field_types_available.

override_column_names

If FALSE, column names in the supplied df must match the names specified in field_types exactly. If TRUE, column names in the supplied df will be replaced with the names specified in field_types. The specification must therefore contain the columns in the correct order. Default = FALSE

na

vector containing strings that should be interpreted as missing values, Default = c("","NA","NULL").

dataset_description

Short description of the dataset being checked. This will appear on the report. If blank, the name of the data frame object will be used

aggregation_timeunit

Unit of time to aggregate over. Specify one of "day", "week", "month", "quarter", "year". The "week" option is Monday-based. Default = "day"

report_title

Title to appear on the report

save_directory

String specifying directory in which to save the report. Default is current directory.

save_filename

String specifying filename for the report, excluding any file extension. If no filename is supplied, one will be automatically generated with the format daiquiri_report_YYMMDD_HHMMSS.

show_progress

Print progress to console. Default = TRUE

log_directory

String specifying directory in which to save log file. If no directory is supplied, progress is not logged.

Value

A list containing information relating to the supplied parameters as well as the resulting daiquiri_source_data and daiquiri_aggregated_data objects.

Details

In order for the package to detect any non-conformant values in numeric or datetime fields, these should be present in the data frame in their raw character format. Rectangular data from a text file will automatically be read in as character type if you use the read_data() function. Data frame columns that are not of class character will still be processed according to the field_types specified.

See Also

read_data(), field_types(), field_types_available()

Examples


# load example data into a data.frame
raw_data <- read_data(
  system.file("extdata", "example_prescriptions.csv", package = "daiquiri"),
  delim = ",",
  col_names = TRUE
)

# create a report in the current directory
daiq_obj <- daiquiri_report(
  raw_data,
  field_types = field_types(
    PrescriptionID = ft_uniqueidentifier(),
    PrescriptionDate = ft_timepoint(),
    AdmissionDate = ft_datetime(includes_time = FALSE, na = "1800-01-01"),
    Drug = ft_freetext(),
    Dose = ft_numeric(),
    DoseUnit = ft_categorical(),
    PatientID = ft_ignore(),
    Location = ft_categorical(aggregate_by_each_category = TRUE)
  ),
  override_column_names = FALSE,
  na = c("", "NULL"),
  dataset_description = "Example data provided with package",
  aggregation_timeunit = "day",
  report_title = "daiquiri data quality report",
  save_directory = ".",
  save_filename = "example_data_report",
  show_progress = TRUE,
  log_directory = NULL
)




[Package daiquiri version 1.1.1 Index]