duplicate_tally {scrutiny} | R Documentation |
Count duplicates at each observation
Description
For every value in a vector or data frame, duplicate_tally()
counts how often it appears in total. Tallies are presented next to each
value.
For summary statistics, call audit()
on the results.
Usage
duplicate_tally(x, ignore = NULL, colname_end = "n")
Arguments
x |
Vector or data frame. |
ignore |
Optionally, a vector of values that should not be checked. In
the test result columns, they will be marked |
colname_end |
String. Name ending of the logical test result columns.
Default is |
Details
This function is not very informative with many input values that
only have a few characters each. Many of them may have duplicates just by
chance. For example, in R's built-in iris
data set, 99% of values have
duplicates.
In general, the fewer values and the more characters per value, the more significant the results.
Value
A tibble (data frame). It has all the columns from x
, and to each
of these columns' right, the corresponding tally column.
The tibble has the scr_dup_detect
class, which is recognized by the
audit()
generic.
Summaries with audit()
There is an S3 method for the audit()
generic, so you can call audit()
following duplicate_tally()
. It
returns a tibble with summary statistics.
See Also
-
duplicate_count()
for a frequency table. -
duplicate_count_colpair()
to check each combination of columns for duplicates. -
janitor::get_dupes()
to search for duplicate rows.
Examples
# Tally duplicate values in a data frame...
duplicate_tally(x = pigs4)
# ...or in a single vector:
duplicate_tally(x = pigs4$snout)
# Summary statistics with `audit()`:
pigs4 %>%
duplicate_tally() %>%
audit()
# Any values can be ignored:
pigs4 %>%
duplicate_tally(ignore = c(8.131, 7.574))