find_duplicates {cleanepi} | R Documentation |
Identify and return duplicated rows in a data frame or linelist.
Description
Identify and return duplicated rows in a data frame or linelist.
Usage
find_duplicates(data, target_columns = NULL)
Arguments
data |
A data frame or linelist. |
target_columns |
A vector of columns names or indices to consider when
looking for duplicates. When the input data is a |
Value
A data frame or linelist of all duplicated rows with following 2 additional columns:
-
row_id
: the indices of the duplicated rows from the input data. Users can choose from these indices, which row they consider as redundant in each group of duplicates. -
group_id
: a unique identifier associated to each group of duplicates.
Examples
dups <- find_duplicates(
data = readRDS(system.file("extdata", "test_linelist.RDS",
package = "cleanepi")),
target_columns = c("dt_onset", "dt_report", "sex", "outcome")
)
[Package cleanepi version 1.0.2 Index]