categorize {coder} | R Documentation |
Categorize cases based on external data and classification scheme
Description
This is the main function of the package, which relies of a triad of objects:
(1) data
with unit id:s and possible dates of interest;
(2) codedata
for corresponding
units and with optional dates of interest and;
(3) a classification scheme (classcodes
object; cc
) with regular
expressions to identify and categorize relevant codes.
The function combines the three underlying steps performed by
codify()
, classify()
and index()
.
Relevant arguments are passed to those functions by
codify_args
and cc_args
.
Usage
categorize(x, ...)
## S3 method for class 'data.frame'
categorize(x, ...)
## S3 method for class 'tbl_df'
categorize(x, ...)
## S3 method for class 'data.table'
categorize(x, ..., codedata, id, code, codify_args = list())
## S3 method for class 'codified'
categorize(
x,
...,
cc,
index = NULL,
cc_args = list(),
check.names = TRUE,
.data_cols = NULL
)
Arguments
x |
data set with mandatory character id column
(identified by argument |
... |
arguments passed between methods |
codedata |
external code data with mandatory character id column
(identified by |
id |
name of unique character id column found in
both |
code |
name of code column in |
codify_args |
Lists of named arguments passed to |
cc |
|
index |
Argument passed to |
cc_args |
List with named arguments passed to
|
check.names |
Column names are based on |
.data_cols |
used internally |
Value
Object of the same class as x
with additional logical columns
indicating membership of groups identified by the
classcodes
object (the cc
argument).
Numeric indices are also included if requested by the index
argument.
See Also
Other verbs:
classify()
,
codify()
,
index_fun
Examples
# For this example, 1 core would suffice:
old_threads <- data.table::getDTthreads()
data.table::setDTthreads(1)
# For some patient data (ex_people) and related hospital visit code data
# with ICD 10-codes (ex_icd10), add the Elixhauser comorbidity
# conditions based on all registered ICD10-codes
categorize(
x = ex_people,
codedata = ex_icd10,
cc = "elixhauser",
id = "name",
code = "icd10"
)
# Add Charlson categories and two versions of a calculated index
# ("quan_original" and "quan_updated").
categorize(
x = ex_people,
codedata = ex_icd10,
cc = "charlson",
id = "name",
code = "icd10",
index = c("quan_original", "quan_updated")
)
# Only include recent hospital visits within 30 days before surgery,
categorize(
x = ex_people,
codedata = ex_icd10,
cc = "charlson",
id = "name",
code = "icd10",
index = c("quan_original", "quan_updated"),
codify_args = list(
date = "surgery",
days = c(-30, -1),
code_date = "admission"
)
)
# Multiple versions -------------------------------------------------------
# We can compare categorization by according to Quan et al. (2005); "icd10",
# and Armitage et al. (2010); "icd10_rcs" (see `?charlson`)
# Note the use of `tech_names = TRUE` to distinguish the column names from the
# two versions.
# We first specify some common settings ...
ind <- c("quan_original", "quan_updated")
cd <- list(date = "surgery", days = c(-30, -1), code_date = "admission")
# ... we then categorize once with "icd10" as the default regular expression ...
categorize(
x = ex_people,
codedata = ex_icd10,
cc = "charlson",
id = "name",
code = "icd10",
index = ind,
codify_args = cd,
cc_args = list(tech_names = TRUE)
) %>%
# .. and once more with `regex = "icd10_rcs"`
categorize(
codedata = ex_icd10,
cc = "charlson",
id = "name",
code = "icd10",
index = ind,
codify_args = cd,
cc_args = list(regex = "icd10_rcs", tech_names = TRUE)
)
# column names ------------------------------------------------------------
# Default column names are based on row names from corresponding classcodes
# object but are modified to be syntactically correct.
default <-
categorize(ex_people, codedata = ex_icd10, cc = "elixhauser",
id = "name", code = "icd10")
# Set `check.names = FALSE` to retain original names:
original <-
categorize(
ex_people, codedata = ex_icd10, cc = "elixhauser",
id = "name", code = "icd10",
check.names = FALSE
)
# Or use `tech_names = TRUE` for informative but long names (use case above)
tech <-
categorize(ex_people, codedata = ex_icd10, cc = "elixhauser",
id = "name", code = "icd10",
cc_args = list(tech_names = TRUE)
)
# Compare
tibble::tibble(names(default), names(original), names(tech))
# Go back to original number of threads
data.table::setDTthreads(old_threads)