| duplicate_count_colpair {scrutiny} | R Documentation |
Count duplicate values by column
Description
duplicate_count_colpair() takes a data frame and checks each combination of
columns for duplicates. Results are presented in a tibble, ordered by the
number of duplicates.
Usage
duplicate_count_colpair(
data,
ignore = NULL,
show_rates = TRUE,
na.rm = deprecated()
)
Arguments
data |
Data frame. |
ignore |
Optionally, a vector of values that should not be checked for duplicates. |
show_rates |
Logical. If |
na.rm |
[Deprecated] Missing values are never counted in any case. |
Value
A tibble (data frame) with these columns –
-
xandy: Each line contains a unique combination ofdata's columns, stored in thexandyoutput columns. -
count: Number of "duplicates", i.e., values that are present in bothxandy. -
total_x,total_y,rate_x, andrate_y(added by default):total_xis the number of non-missing values in the column named underx. Also,rate_xis the proportion ofxvalues that are duplicated iny, i.e.,count / total_x. Likewise withtotal_yandrate_y. The tworate_*columns will be equal unlessNAvalues are present.
Summaries with audit()
There is an S3 method for audit(),
so you can call audit() following duplicate_count_colpair(). It
returns a tibble with summary statistics.
See Also
-
duplicate_count()for a frequency table. -
duplicate_tally()to show instances of a value next to each instance. -
janitor::get_dupes()to search for duplicate rows. -
corrr::colpair_map(), a versatile tool for pairwise column analysis which the present function wraps.
Examples
# Basic usage:
mtcars %>%
duplicate_count_colpair()
# Summaries with `audit()`:
mtcars %>%
duplicate_count_colpair() %>%
audit()