count_compare {eHDPrep}R Documentation

Compare unique values before and after data modification

Description

Performs comparison of variables before and after a change has been applied in order to allow manual inspection and review of modifications made during the dataset preparation process.

Usage

count_compare(
  cols2compare,
  before_tbl = NULL,
  after_tbl = NULL,
  only_diff = FALSE,
  kableout = TRUE,
  caption = NULL,
  latex_wrap = FALSE
)

Arguments

cols2compare

Variables to compare between tables.

before_tbl

Data frame from before modification was made.

after_tbl

Data frame from after modification was made.

only_diff

Keep only rows which differ between the tables (good for variables with many unique values, such as numeric variables).

kableout

Should output be a kable from knitr? If not, returns a tibble. (Default: TRUE)

caption

Caption for kable's caption parameter.

latex_wrap

Should tables be aligned vertically rather than horizontally? Useful for wide table which would otherwise run off a page in LaTeX format.

Details

The purpose of this function is to summarise individual alterations in a dataset and works best with categorical variables. The output contains two tables derived from the parameters before_tbl and after_tbl. Each table shows the unique combinations of values in variables specified in the parameter cols2compare if the variable is present. The tables are presented as two sub-tables and therefore share a single table caption. This caption is automatically generated describing the content of the two sub-tables when the parameter caption is not specified. The default output is a kable containing two sub-kables however if the parameter kableout is FALSE, a list containing the two tibbles are returned. This may preferable for further analysis on the tables' contents.

Value

Returns list of two tibbles or a kable (see kableout argument), each tallying unique values in specified columns in each input table.

Examples

# merge data as the example modification
example_data_merged <- merge_cols(example_data, diabetes_type, diabetes, 
"diabetes_merged", rm_in_vars = TRUE)

# review the differences between the input and output of the variable merging step above:
count_compare(before_tbl = example_data,
              after_tbl = example_data_merged,
                            cols2compare = c("diabetes", "diabetes_type", "diabetes_merged"),
                            kableout = FALSE)

[Package eHDPrep version 1.3.3 Index]