inspect_imb {inspectdf}R Documentation

Summary and comparison of the most common levels in categorical columns

Description

For a single dataframe, summarise the most common level in each categorical column. If two dataframes are supplied, compare the most common levels of categorical features appearing in both dataframes. For grouped dataframes, summarise the levels of categorical columns in the dataframe split by group.

Usage

inspect_imb(df1, df2 = NULL, include_na = FALSE)

Arguments

df1

A dataframe.

df2

An optional second data frame for comparing columnwise imbalance. Defaults to NULL.

include_na

Logical flag, whether to include missing values as a unique level. Default is FALSE - to ignore NA values.

Details

For a single dataframe, the tibble returned contains the columns:

For a pair of dataframes, the tibble returned contains the columns:

For a grouped dataframe, the tibble returned is as for a single dataframe, but where the first k columns are the grouping columns. There will be as many rows in the result as there are unique combinations of the grouping variables.

Value

A tibble summarising and comparing the imbalance for each categorical column in one or a pair of dataframes.

Author(s)

Alastair Rushworth

See Also

inspect_cat, show_plot

Examples

# Load dplyr for starwars data & pipe
library(dplyr)

# Single dataframe summary
inspect_imb(starwars)

# Paired dataframe comparison
inspect_imb(starwars, starwars[1:20, ])

# Grouped dataframe summary
starwars %>% group_by(gender) %>% inspect_imb()

[Package inspectdf version 0.0.12 Index]