restrict_n {healthdb} | R Documentation |
Remove or flag groups with n less than some number
Description
Remove or flags groups or clients that have less than some number of rows or some number of distinct values in a variable. For example, it can be used to remove clients that had less than n visits to some service on different dates from some administrative records. It offers filtering with dplyr::n_distinct()
functionality for database input.
Usage
restrict_n(
data,
clnt_id,
n_per_clnt,
count_by = NULL,
mode = c("flag", "filter"),
verbose = getOption("healthdb.verbose")
)
Arguments
data |
Data.frames or remote tables (e.g., from |
clnt_id |
Grouping variable (quoted/unquoted). |
n_per_clnt |
A single number specifying the minimum number of group size. |
count_by |
Another variable dictating the counting unit of |
mode |
Either "flag" - add a new column 'flag_restrict_n' indicating if the client met the condition (all rows from a qualified client would have flag = 1), or "filter" - remove clients that did not meet the condition from the data. Default is "flag". |
verbose |
A logical for whether to explain the query and report how many groups were removed. Default is fetching from options. Use |
Value
A subset of input data satisfied the group size requirement, or raw input data with an new flag column.
See Also
dplyr::n()
, dplyr::n_distinct()
Examples
# flag cyl groups with less than 8 cars
restrict_n(mtcars, clnt_id = cyl, n_per_clnt = 8, mode = "flag") %>%
head()
#remove cyl groups with less than 2 types of gear boxes
restrict_n(mtcars, clnt_id = cyl, n_per_clnt = 3, count_by = gear, mode = "filter")