inspect_na {inspectdf} | R Documentation |
Summary and comparison of the rate of missingness across dataframe columns
Description
For a single dataframe, summarise the rate of missingness in each column. If two dataframes are supplied, compare missingness for columns appearing in both dataframes. For grouped dataframes, summarise the rate of missingness separately for each group.
Usage
inspect_na(df1, df2 = NULL)
Arguments
df1 |
A data frame |
df2 |
An optional second data frame for making columnwise comparison of missingness.
Defaults to |
Details
For a single dataframe, the tibble returned contains the columns:
-
col_name
, a character vector containing column names ofdf1
. -
cnt
, an integer vector containing the number of missing values by column. -
pcnt
, the percentage of records in each columns that is missing.
For a pair of dataframes, the tibble returned contains the columns:
-
col_name
, the name of the columns occurring in eitherdf1
ordf2
. -
cnt_1
,cnt_2
, a pair of integer vectors containing counts of missing entries for each column indf1
anddf2
. -
pcnt_1
,pcnt_2
, a pair of columns containing percentage of missing entries for each column indf1
anddf2
. -
p_value
, the p-value associated with test of equivalence of rates of missingness. Small values indicate evidence that the rate of missingness differs for a column occurring in bothdf1
anddf2
.
For a grouped dataframe, the tibble returned is as for a single dataframe, but where
the first k
columns are the grouping columns. There will be as many rows in the result
as there are unique combinations of the grouping variables.
Value
A tibble summarising the count and percentage of columnwise missingness for one or a pair of data frames.
Author(s)
Alastair Rushworth
See Also
Examples
# Load dplyr for starwars data & pipe
library(dplyr)
# Single dataframe summary
inspect_na(starwars)
# Paired dataframe comparison
inspect_na(starwars, starwars[1:20, ])
# Grouped dataframe summary
starwars %>% group_by(gender) %>% inspect_na()