load_concepts {ricu} | R Documentation |
Load concept data
Description
Concept objects are used in ricu
as a way to specify how a clinical
concept, such as heart rate can be loaded from a data source. Building on
this abstraction, load_concepts()
powers concise loading of data with
data source specific preprocessing hidden away from the user, thereby
providing a data source agnostic interface to data loading. At default
value of the argument merge_data
, a tabular data structure (either a
ts_tbl
or an id_tbl
, depending on what kind of
concepts are requested), inheriting from
data.table
, is returned, representing the data
in wide format (i.e. returning concepts as columns).
Usage
load_concepts(x, ...)
## S3 method for class 'character'
load_concepts(
x,
src = NULL,
concepts = NULL,
...,
dict_name = "concept-dict",
dict_dirs = NULL
)
## S3 method for class 'integer'
load_concepts(
x,
src = NULL,
concepts = NULL,
...,
dict_name = "concept-dict",
dict_dirs = NULL
)
## S3 method for class 'numeric'
load_concepts(x, ...)
## S3 method for class 'concept'
load_concepts(
x,
src = NULL,
aggregate = NULL,
merge_data = TRUE,
verbose = TRUE,
...
)
## S3 method for class 'cncpt'
load_concepts(x, aggregate = NULL, ..., progress = NULL)
## S3 method for class 'num_cncpt'
load_concepts(x, aggregate = NULL, ..., progress = NULL)
## S3 method for class 'unt_cncpt'
load_concepts(x, aggregate = NULL, ..., progress = NULL)
## S3 method for class 'fct_cncpt'
load_concepts(x, aggregate = NULL, ..., progress = NULL)
## S3 method for class 'lgl_cncpt'
load_concepts(x, aggregate = NULL, ..., progress = NULL)
## S3 method for class 'rec_cncpt'
load_concepts(
x,
aggregate = NULL,
patient_ids = NULL,
id_type = "icustay",
interval = hours(1L),
...,
progress = NULL
)
## S3 method for class 'item'
load_concepts(
x,
patient_ids = NULL,
id_type = "icustay",
interval = hours(1L),
progress = NULL,
...
)
## S3 method for class 'itm'
load_concepts(
x,
patient_ids = NULL,
id_type = "icustay",
interval = hours(1L),
...
)
Arguments
x |
Object specifying the data to be loaded |
... |
Passed to downstream methods |
src |
A character vector, used to subset the |
concepts |
The concepts to be used, or |
dict_name , dict_dirs |
In case not concepts are passed as |
aggregate |
Controls how data within concepts is aggregated |
merge_data |
Logical flag, specifying whether to merge concepts into wide format or return a list, each entry corresponding to a concept |
verbose |
Logical flag for muting informational output |
progress |
Either |
patient_ids |
Optional vector of patient ids to subset the fetched data with |
id_type |
String specifying the patient id type to return |
interval |
The time interval used to discretize time stamps with,
specified as |
Details
In order to allow for a large degree of flexibility (and extensibility),
which is much needed owing to considerable heterogeneity presented by
different data sources, several nested S3 classes are involved in
representing a concept and load_concepts()
follows this hierarchy of
classes recursively when
resolving a concept. An outline of this hierarchy can be described as
-
concept
: contains manycncpt
objects (of potentially differing sub-types), each comprising of some meta-data and anitem
object -
item
: contains manyitm
objects (of potentially differing sub-types), each encoding how to retrieve a data item.
The design choice for wrapping a vector of cncpt
objects with a container
class concept
is motivated by the requirement of having several different
sub-types of cncpt
objects (all inheriting from the parent type cncpt
),
while retaining control over how this homogeneous w.r.t. parent type, but
heterogeneous w.r.t. sub-type vector of objects behaves in terms of S3
generic functions.
Value
An id_tbl
/ts_tbl
or a list thereof, depending on loaded
concepts and the value passed as merge_data
.
Concept
Top-level entry points are either a character vector of concept names or an
integer vector of concept IDs (matched against omopid
fields), which are
used to subset a concept
object or an entire concept dictionary, or a concept
object. When passing a
character/integer vector as first argument, the most important further
arguments at that level control from where the dictionary is taken
(dict_name
or dict_dirs
). At concept
level, the most important
additional arguments control the result structure: data merging can be
disabled using merge_data
and data aggregation is governed by the
aggregate
argument.
Data aggregation is important for merging several concepts into a
wide-format table, as this requires data to be unique per observation (i.e.
by either id or combination of id and index). Several value types are
acceptable as aggregate
argument, the most important being FALSE
, which
disables aggregation, NULL, which auto-determines a suitable aggregation
function or a string which is ultimately passed to dt_gforce()
where it
identifies a function such as sum()
, mean()
, min()
or max()
. More
information on aggregation is available as aggregate().
If the object passed as aggregate
is scalar, it is applied to all
requested concepts in the same way. In order to customize aggregation per
concept, a named object (with names corresponding to concepts) of the same
length as the number of requested concepts may be passed.
Under the hood, a concept
object comprises of several cncpt
objects
with varying sub-types (for example num_cncpt
, representing continuous
numeric data or fct_cncpt
representing categorical data). This
implementation detail is of no further importance for understanding concept
loading and for more information, please refer to the
concept
documentation. The only argument that is introduced
at cncpt
level is progress
, which controls progress reporting. If
called directly, the default value of NULL
yields messages, sent to the
terminal. Internally, if called from load_concepts()
at concept
level
(with verbose
set to TRUE
), a progress::progress_bar is set up in a
way that allows nested messages to be captured and not interrupt progress
reporting (see msg_progress()
).
Item
A single cncpt
object contains an item
object, which in turn is
composed of several itm
objects with varying sub-types, the relationship
item
to itm
being that of concept
to cncpt
and the rationale for
this implementation choice is the same as previously: a container class
used representing a vector of objects of varying sub-types, all inheriting
form a common super-type. For more information on the item
class, please
refer to the relevant documentation. Arguments introduced at item
level include patient_ids
, id_type
and interval
. Acceptable values for
interval
are scalar-valued base::difftime()
objects (see also helper
functions such as hours()
) and this argument essentially controls the
time-resolution of the returned time-series. Of course, the limiting factor
raw time resolution which is on the order of hours for data sets like
MIMIC-III or
eICU but can be much higher for a
data set like HiRID. The argument
id_type
is used to specify what kind of id system should be used to
identify different time series in the returned data. A data set like
MIMIC-III, for example, makes possible the resolution of data to 3 nested
ID systems:
-
patient
(subject_id
): identifies a person -
hadm
(hadm_id
): identifies a hospital admission (several of which are possible for a given person) -
icustay
(icustay_id
): identifies an admission to an ICU and again has a one-to-many relationship tohadm
.
Acceptable argument values are strings that match ID systems as specified
by the data source configuration. Finally, patient_ids
is used to define a patient cohort for which data can be requested. Values
may either be a vector of IDs (which are assumed to be of the same type as
specified by the id_type
argument) or a tabular object inheriting from
data.frame
, which must contain a column named after the data set-specific
ID system identifier (for MIMIC-III and an id_type
argument of hadm
,
for example, that would be hadm_id
).
Extensions
The presented hierarchy of S3 classes is designed with extensibility in
mind: while the current range of functionality covers settings encountered
when dealing with the included concepts and datasets, further data sets
and/or clinical concepts might necessitate different behavior for data
loading. For this reason, various parts in the cascade of calls to
load_concepts()
can be adapted for new requirements by defining new sub-
classes to cncpt
or itm
and providing methods for the generic function
load_concepts()
specific to these new classes. At cncpt
level, method
dispatch defaults to load_concepts.cncpt()
if no method specific to the
new class is provided, while at itm
level, no default function is
available.
Roughly speaking, the semantics for the two functions are as follows:
-
cncpt
: Called with argumentsx
(the currentcncpt
object),aggregate
(controlling how aggregation per time-point and ID is handled),...
(further arguments passed to downstream methods) andprogress
(controlling progress reporting), this function should be able to load and aggregate data for the given concept. Usually this involves extracting theitem
object and callingload_concepts()
again, dispatching on theitem
class with argumentsx
(the givenitem
), arguments passed as...
, as well asprogress
. -
itm
: Called with argumentsx
(the current object inheriting fromitm
,patient_ids
(NULL
or a patient ID selection),id_type
(a string specifying what ID system to retrieve), andinterval
(the time series interval), this function actually carries out the loading of individual data items, using the specified ID system, rounding times to the correct interval and subsetting on patient IDs. As return value, on object of class as specified by thetarget
entry is expected and alldata_vars()
should be named consistently, as data corresponding to multipleitm
objects concatenated in row-wise fashion as inbase::rbind()
.
Examples
if (require(mimic.demo)) {
dat <- load_concepts("glu", "mimic_demo")
gluc <- concept("gluc",
item("mimic_demo", "labevents", "itemid", list(c(50809L, 50931L)))
)
identical(load_concepts(gluc), dat)
class(dat)
class(load_concepts(c("sex", "age"), "mimic_demo"))
}