R: Create a data quality report from a data frame

daiquiri_report {daiquiri}

R Documentation

Create a data quality report from a data frame

Description

Accepts record-level data from a data frame, validates it against the expected type of content of each column, generates a collection of time series plots for visual inspection, and saves a report to disk.

Usage

daiquiri_report(
  df,
  field_types,
  override_column_names = FALSE,
  na = c("", "NA", "NULL"),
  dataset_description = NULL,
  aggregation_timeunit = "day",
  report_title = "daiquiri data quality report",
  save_directory = ".",
  save_filename = NULL,
  show_progress = TRUE,
  log_directory = NULL
)

Arguments

`df`	A data frame. Rectangular data can be read from file using `read_data()`. See Details.
`field_types`	`field_types()` object specifying names and types of fields (columns) in the supplied `df`. See also field_types_available.
`override_column_names`	If `FALSE`, column names in the supplied `df` must match the names specified in `field_types` exactly. If `TRUE`, column names in the supplied `df` will be replaced with the names specified in `field_types`. The specification must therefore contain the columns in the correct order. Default = `FALSE`
`na`	vector containing strings that should be interpreted as missing values, Default = `c("","NA","NULL")`.
`dataset_description`	Short description of the dataset being checked. This will appear on the report. If blank, the name of the data frame object will be used
`aggregation_timeunit`	Unit of time to aggregate over. Specify one of `"day"`, `"week"`, `"month"`, `"quarter"`, `"year"`. The `"week"` option is Monday-based. Default = `"day"`
`report_title`	Title to appear on the report
`save_directory`	String specifying directory in which to save the report. Default is current directory.
`save_filename`	String specifying filename for the report, excluding any file extension. If no filename is supplied, one will be automatically generated with the format `daiquiri_report_YYMMDD_HHMMSS`.
`show_progress`	Print progress to console. Default = `TRUE`
`log_directory`	String specifying directory in which to save log file. If no directory is supplied, progress is not logged.

Value

A list containing information relating to the supplied parameters as well as the resulting daiquiri_source_data and daiquiri_aggregated_data objects.

Details

In order for the package to detect any non-conformant values in numeric or datetime fields, these should be present in the data frame in their raw character format. Rectangular data from a text file will automatically be read in as character type if you use the read_data() function. Data frame columns that are not of class character will still be processed according to the field_types specified.

Examples


# load example data into a data.frame
raw_data <- read_data(
  system.file("extdata", "example_prescriptions.csv", package = "daiquiri"),
  delim = ",",
  col_names = TRUE
)

# create a report in the current directory
daiq_obj <- daiquiri_report(
  raw_data,
  field_types = field_types(
    PrescriptionID = ft_uniqueidentifier(),
    PrescriptionDate = ft_timepoint(),
    AdmissionDate = ft_datetime(includes_time = FALSE, na = "1800-01-01"),
    Drug = ft_freetext(),
    Dose = ft_numeric(),
    DoseUnit = ft_categorical(),
    PatientID = ft_ignore(),
    Location = ft_categorical(aggregate_by_each_category = TRUE)
  ),
  override_column_names = FALSE,
  na = c("", "NULL"),
  dataset_description = "Example data provided with package",
  aggregation_timeunit = "day",
  report_title = "daiquiri data quality report",
  save_directory = ".",
  save_filename = "example_data_report",
  show_progress = TRUE,
  log_directory = NULL
)

[Package daiquiri version 1.1.1 Index]