restrict_date {healthdb} | R Documentation |
Remove or flag groups failed to meet conditions based on dates
Description
For each client or group, interpret if they have n records that are at least certain days apart AND within a specified time span. When identifying events/diseases from administrative data, definitions often require, e.g., n diagnoses that are at least some days apart within some years. This function is intended for such use and optimized to avoid looping through all n-size combinations of dates per client.
Usage
restrict_date(
data,
clnt_id,
date_var,
n,
apart = NULL,
within = NULL,
uid = NULL,
mode = c("flag", "filter"),
flag_at = c("left", "right"),
dup.rm = TRUE,
force_collect = FALSE,
verbose = getOption("healthdb.verbose"),
check_missing = FALSE,
...
)
Arguments
data |
Data frames or remote tables (e.g., from |
clnt_id |
Grouping variable (quoted/unquoted). |
date_var |
Variable name (quoted/unquoted) for the dates to be interpreted. |
n |
An integer for the size of a draw. |
apart |
An integer specifying the minimum gap (in days) between adjacent dates in a draw. |
within |
An integer specifying the maximum time span (in days) of a draw. |
uid |
Variable name for a unique row identifier. It is necessary for SQL to produce consistent result based on sorting. |
mode |
Either:
|
flag_at |
Character, define if the flag should be placed at the start ("left") or end ("right") of a time period that contains n qualified records. Defaults to "left". Note that this would impact the first and last qualified/diagnosed dates of a client, e.g., using "right" will have the first flag not at the earliest but the date which the client became qualified. For example, if the condition was 2 records within a year, for |
dup.rm |
Logical for whether multiple records on the same date should be count as one in calculation. Only applicable when |
force_collect |
A logical for whether force downloading remote table if |
verbose |
A logical for whether to explain the query and report how many groups were removed. Default is fetching from options. Use |
check_missing |
A logical for whether to check and remove missing entries in |
... |
Additional argument passing to |
Value
A subset of input data satisfied the dates requirement, or raw input data with an new flag column.
See Also
Examples
sample_size <- 30
df <- data.frame(
clnt_id = sample(1:sample_size, sample_size, replace = TRUE),
service_dt = sample(seq(as.Date("2020-01-01"), as.Date("2020-01-31"), by = 1),
size = sample_size, replace = TRUE
),
diagx = sample(letters, size = sample_size, replace = TRUE),
diagx_1 = sample(c(NA, letters), size = sample_size, replace = TRUE),
diagx_2 = sample(c(NA, letters), size = sample_size, replace = TRUE)
)
# Keep clients with 2 records that were 1 week apart within 1 month
restrict_date(df, clnt_id, service_dt, n = 2, apart = 7, within = 30)