pool_case {healthdb} | R Documentation |
Pool qualified clients from results of multiple definitions
Description
This function filters and pools, i.e., row bind, qualified clients/groups from different source with an option to summarize by client. Unlike bind_source()
, no need to supply variable names; the function will guess what should be included and their names from the supplied definition from build_def()
. Whether a client is qualified relies on the flag variables set by define_case()
. Therefore, this function is intended to be use only with the built-in define_case()
as def_fn
in build_def()
.
Usage
pool_case(
data,
def,
output_lvl = c("raw", "clnt"),
include_src = c("all", "has_valid", "n_per_clnt"),
...
)
Arguments
data |
A list of data.frame or remote table which should be output from |
def |
A tibble of case definition generated by |
output_lvl |
Either:
|
include_src |
Character. It determines records from which sources should be included. This matters when clients were identified only from, not all, but some of the sources. This choice will not impact the number of client that would be identified but has impact on the number of records and the latest entry date. The options are one of:
|
... |
Additional arguments passing to |
Value
A data.frame or remote table with clients that satisfied the predefined case definition. Columns started with "raw_in_" are source-specific counts of raw records, and columns started with "valid_in_" are the number of valid entries (or the number of flags) in each source.
Examples
# toy data
df1 <- make_test_dat()
df2 <- make_test_dat()
# use build_def to make a toy definition
sud_def <- build_def("SUD", # usually a disease name
src_lab = c("src1", "src2"), # identify from multiple sources, e.g., hospitalization, ED visits.
# functions that filter the data with some criteria
def_fn = define_case,
fn_args = list(
vars = starts_with("diagx"),
match = "start", # "start" will be applied to all sources as length = 1
vals = list(c("304"), c("305")),
clnt_id = "clnt_id", # list()/c() could be omitted for single element
# c() can be used in place of list
# if this argument only takes one value for each source
n_per_clnt = c(2, 3)
)
)
# save the definition for re-use
# saveRDS(sud_def, file = some_path)
# execute definition
sud_by_src <- sud_def %>% execute_def(with_data = list(src1 = df1, src2 = df2))
# pool results from src1 and src2 together at client level
pool_case(sud_by_src, sud_def, output_lvl = "clnt")