duplicate_rows {timeplyr} | R Documentation |
Find duplicate rows
Description
Find duplicate rows
Usage
duplicate_rows(
data,
...,
.keep_all = FALSE,
.both_ways = FALSE,
.add_count = FALSE,
.drop_empty = FALSE,
sort = FALSE,
.by = NULL,
.cols = NULL
)
Arguments
data |
A data frame. |
... |
Variables used to find duplicate rows. |
.keep_all |
If |
.both_ways |
If |
.add_count |
If |
.drop_empty |
If |
sort |
Should result be sorted?
If |
.by |
(Optional). A selection of columns to group by for this operation. Columns are specified using tidy-select. |
.cols |
(Optional) alternative to |
Details
This function works like dplyr::distinct()
in its handling of
arguments and data-masking but returns duplicate rows.
In certain situations in can be much faster than data %>% group_by() %>% filter(n() > 1)
when there are many groups.
fduplicates2()
returns the same output but uses a different
method which utilises joins and is written almost entirely using dplyr.
Value
A data.frame
of duplicate rows.
See Also
fcount group_collapse fdistinct
Examples
library(dplyr)
library(timeplyr)
library(ggplot2)
# Duplicates across all columns
diamonds %>%
duplicate_rows()
# Alternatively with row ids
diamonds %>%
filter(frowid(.) > 1)
# Diamonds with the same dimensions
diamonds %>%
duplicate_rows(x, y, z)
# Can use tidyverse select notation
diamonds %>%
duplicate_rows(across(where(is.factor)), .keep_all = FALSE)
# Similar to janitor::get_dupes()
diamonds %>%
duplicate_rows(.add_count = TRUE)
# Keep the first instance of each duplicate row
diamonds %>%
duplicate_rows(.both_ways = TRUE)
# Same as the below
diamonds %>%
fadd_count(across(everything())) %>%
filter(n > 1)