R: Return duplicated rows of data.table

duplicated_rows {hutils}

R Documentation

Return duplicated rows of data.table

Description

This function differs from duplicated in that it returns both the duplicate row and the row which has been duplicated. This may prove useful in combination with the by argument for determining whether two observations are identical across more than just the specified columns.

Usage

duplicated_rows(
  DT,
  by = names(DT),
  na.rm = FALSE,
  order = TRUE,
  copyDT = TRUE,
  na.last = FALSE
)

Arguments

`DT`	A `data.table`.
`by`	Character vector of columns to evaluate duplicates over.
`na.rm`	(logical) Should `NA`s in `by` be removed before returning duplicates? (Default `FALSE`.)
`order`	(logical) Should the result be ordered so that duplicate rows are adjacent? (Default `TRUE`.)
`copyDT`	(logical) Should `DT` be copied prior to detecting duplicates. If `FALSE`, the ordering of `DT` will be changed by reference.
`na.last`	(logical) If `order` is TRUE, should `NA`s be ordered first or last?. Passed to `data.table::setorderv`.

Value

Duplicate rows of DT by by. For interactive use.

Examples


if (requireNamespace("data.table", quietly = TRUE)) {
  library(data.table)

  DT <- data.table(x = rep(1:4, 3),
                   y = rep(1:2, 6),
                   z = rep(1:3, 4))

  # No duplicates
  duplicated_rows(DT)

  # x and y have duplicates
  duplicated_rows(DT, by = c("x", "y"), order = FALSE)

  # By default, the duplicate rows are presented adjacent to each other.
  duplicated_rows(DT, by = c("x", "y"))
}

[Package hutils version 1.8.1 Index]