exclude {healthdb}R Documentation

Remove rows based on conditions or another data set

Description

This function combines dplyr::anti_join(), and negation of dplyr::filter(). When a second data set is supplied through the excl argument, anti join would be performed; otherwise, data would be filtered with the expression given via the condition argument, and the filter result would in turn be removed using dplyr::setdiff().

Usage

exclude(
  data,
  excl = NULL,
  by = NULL,
  condition = NULL,
  verbose = getOption("healthdb.verbose"),
  report_on = NULL,
  ...
)

Arguments

data

Data.frames or remote tables (e.g., from dbplyr::tbl_sql()). A subset will be removed from this data.

excl

Data frames or remote tables (e.g., from 'dbplyr'). Rows/values present in it will be removed from data if there is a match. This will be passed to dplyr::anti_join() as the second argument.

by

Column names that should be matched by dplyr::anti_join(), or a expressions with dplyr::join_by(). See dplyr::anti_join()'s by argument for detail. Default NULL is the same as setdiff(data, excl).

condition

An expression that will be passed to dplyr::filter(). The rows that satisfy condition are those to be removed from data.

verbose

A logical for whether printing explanation for the operation. Default is fetching from options. Use options(healthdb.verbose = FALSE) to suppress once and for all.

report_on

A quoted/unquoted column name for counting how many of its distinct values were removed from data, e.g., counting how many client IDs were removed. Default is NULL.

...

Additional arguments passing to dplyr::filter()/dplyr::anti_join() for finer control of matching, e.g., na action, by-group filtering, etc.

Value

A data frame or remote table that is a subset of data.

Examples

# exclude with condition
cyl_not_4 <- exclude(mtcars, condition = cyl == 4, report_on = cyl)

# exclude with another data
exclude(mtcars, cyl_not_4, dplyr::join_by(cyl), report_on = cyl)

[Package healthdb version 0.3.1 Index]