duplicate_count_colpair {scrutiny} | R Documentation |
Count duplicate values by column
Description
duplicate_count_colpair()
takes a data frame and checks each combination of
columns for duplicates. Results are presented in a tibble, ordered by the
number of duplicates.
Usage
duplicate_count_colpair(
data,
ignore = NULL,
show_rates = TRUE,
na.rm = deprecated()
)
Arguments
data |
Data frame. |
ignore |
Optionally, a vector of values that should not be checked for duplicates. |
show_rates |
Logical. If |
na.rm |
[Deprecated] Missing values are never counted in any case. |
Value
A tibble (data frame) with these columns –
-
x
andy
: Each line contains a unique combination ofdata
's columns, stored in thex
andy
output columns. -
count
: Number of "duplicates", i.e., values that are present in bothx
andy
. -
total_x
,total_y
,rate_x
, andrate_y
(added by default):total_x
is the number of non-missing values in the column named underx
. Also,rate_x
is the proportion ofx
values that are duplicated iny
, i.e.,count / total_x
. Likewise withtotal_y
andrate_y
. The tworate_*
columns will be equal unlessNA
values are present.
Summaries with audit()
There is an S3 method for audit()
,
so you can call audit()
following duplicate_count_colpair()
. It
returns a tibble with summary statistics.
See Also
-
duplicate_count()
for a frequency table. -
duplicate_tally()
to show instances of a value next to each instance. -
janitor::get_dupes()
to search for duplicate rows. -
corrr::colpair_map()
, a versatile tool for pairwise column analysis which the present function wraps.
Examples
# Basic usage:
mtcars %>%
duplicate_count_colpair()
# Summaries with `audit()`:
mtcars %>%
duplicate_count_colpair() %>%
audit()