identify_row {healthdb} | R Documentation |
Identify rows with a match
Description
Filter rows which values satisfy the specified conditions. The functionality is identical to dplyr::filter()
combined with dplyr::if_any()
or dplyr::if_all()
, but it used the 'data.table' package vignette("datatable-intro", package = "data.table")
for data.frame method, and has regular regular expression support for remote database tables. The motivation is to take away some pain when working with databases which often do not support regular expression and 'LIKE' operator with multiple string patterns.
Usage
identify_row(
data,
vars,
match = c("in", "start", "regex", "like", "between", "glue_sql"),
vals,
if_all = FALSE,
verbose = getOption("healthdb.verbose"),
query_only = TRUE,
...
)
Arguments
data |
Data.frames or remote tables (e.g., from |
vars |
An expression passing to |
match |
One of "in", "start", "regex", "like", "between", and "glue_sql". It determines how values would be matched. The operations under each type:
|
vals |
Depending on
|
if_all |
A logical for whether combining the predicates (if multiple columns were selected by vars) with AND instead of OR. Default is FALSE, e.g., var1 in vals OR var2 in vals. |
verbose |
A logical for whether printing explanation and result overview for the query. Default is fetching from options. Use |
query_only |
A logical for whether keeping the output as remote table (Default TRUE) or downloading the query result as a tibble (FALSE). The argument is ignored when the input data is a data.frame/tibble. |
... |
For remote table method only. Additional arguments passing to |
Value
A data.frame or tbl_sql object depending on the input.
Examples
#applying to data.frame; both sepal length and width in range 3-5
identify_row(iris, starts_with("Sepal"), "between", c(3, 5), if_all = TRUE)
#applying to remote table; species starts with se or ends with ca
iris_db <- dbplyr::memdb_frame(iris)
identify_row(iris_db, Species, "like", c("se%", "%ca"))
#using glue_sql to write the WHERE clause
#use {`vars`} to refer to the variables selected by vars
#supply additional values required in the query through '...'
#note that if you use LIKE here, you cannot supply multiple patterns in what
identify_row(iris_db, Species, "glue_sql",
"{`vars`} LIKE {what}",
what = "se%")
#add * after a vector
identify_row(iris_db, Species, "glue_sql",
"{`vars`} IN ({what*})",
what = c("setosa", "virginica"))