pool_case {healthdb}R Documentation

Pool qualified clients from results of multiple definitions

Description

This function filters and pools, i.e., row bind, qualified clients/groups from different source with an option to summarize by client. Unlike bind_source(), no need to supply variable names; the function will guess what should be included and their names from the supplied definition from build_def(). Whether a client is qualified relies on the flag variables set by define_case(). Therefore, this function is intended to be use only with the built-in define_case() as def_fn in build_def().

Usage

pool_case(
  data,
  def,
  output_lvl = c("raw", "clnt"),
  include_src = c("all", "has_valid", "n_per_clnt"),
  ...
)

Arguments

data

A list of data.frame or remote table which should be output from execute_def().

def

A tibble of case definition generated by build_def().

output_lvl

Either:

  • "raw" - output all records (default),

  • or "clnt" - output one record per client with summaries including date of first valid record ('first_valid_date'), date of the latest record ('last_entry_date'), and sources that contain valid records.

include_src

Character. It determines records from which sources should be included. This matters when clients were identified only from, not all, but some of the sources. This choice will not impact the number of client that would be identified but has impact on the number of records and the latest entry date. The options are one of:

  • "all" - records from all sources are included;

  • "has_valid" - for each client, records from sources that contain at least one valid record are included;

  • "n_per_clnt" - for each client, if they had fewer than n_per_clnt records in a source (see restrict_n()), then records from that source are removed.

...

Additional arguments passing to bind_source()

Value

A data.frame or remote table with clients that satisfied the predefined case definition. Columns started with "raw_in_" are source-specific counts of raw records, and columns started with "valid_in_" are the number of valid entries (or the number of flags) in each source.

Examples

# toy data
df1 <- make_test_dat()
df2 <- make_test_dat()

# use build_def to make a toy definition
sud_def <- build_def("SUD", # usually a disease name
  src_lab = c("src1", "src2"), # identify from multiple sources, e.g., hospitalization, ED visits.
  # functions that filter the data with some criteria
  def_fn = define_case,
  fn_args = list(
    vars = starts_with("diagx"),
    match = "start", # "start" will be applied to all sources as length = 1
    vals = list(c("304"), c("305")),
    clnt_id = "clnt_id", # list()/c() could be omitted for single element
    # c() can be used in place of list
    # if this argument only takes one value for each source
    n_per_clnt = c(2, 3)
  )
)

# save the definition for re-use
# saveRDS(sud_def, file = some_path)

# execute definition
sud_by_src <- sud_def %>% execute_def(with_data = list(src1 = df1, src2 = df2))

# pool results from src1 and src2 together at client level
pool_case(sud_by_src, sud_def, output_lvl = "clnt")

[Package healthdb version 0.3.1 Index]