derive_vars_joined {admiral} | R Documentation |
Add Variables from an Additional Dataset Based on Conditions from Both Datasets
Description
The function adds variables from an additional dataset to the input dataset. The selection of the observations from the additional dataset can depend on variables from both datasets. For example, add the lowest value (nadir) before the current observation.
Usage
derive_vars_joined(
dataset,
dataset_add,
by_vars = NULL,
order = NULL,
new_vars = NULL,
tmp_obs_nr_var = NULL,
join_vars = NULL,
join_type,
filter_add = NULL,
first_cond_lower = NULL,
first_cond_upper = NULL,
filter_join = NULL,
mode = NULL,
exist_flag = NULL,
true_value = "Y",
false_value = NA_character_,
missing_values = NULL,
check_type = "warning"
)
Arguments
dataset |
Input dataset The variables specified by the |
dataset_add |
Additional dataset The variables specified by the |
by_vars |
Grouping variables The two datasets are joined by the specified variables. Variables can be renamed by naming the element, i.e.
Permitted Values: list of variables created by |
order |
Sort order If the argument is set to a non-null value, for each observation of the
input dataset the first or last observation from the joined dataset is
selected with respect to the specified order. The specified variables are
expected in the additional dataset ( If an expression is named, e.g., For handling of Permitted Values: list of expressions created by |
new_vars |
Variables to add The specified variables from the additional dataset are added to the output
dataset. Variables can be renamed by naming the element, i.e., For example And Values of the added variables can be modified by specifying an expression.
For example, If the argument is not specified or set to Permitted Values: list of variables or named expressions created by |
tmp_obs_nr_var |
Temporary observation number The specified variable is added to the input dataset ( The variable is not included in the output dataset. To include it specify
it for |
join_vars |
Variables to use from additional dataset Any extra variables required from the additional dataset for If an expression is named, e.g., The variables are not included in the output dataset. Permitted Values: list of variables or named expressions created by |
join_type |
Observations to keep after joining The argument determines which of the joined observations are kept with
respect to the original observation. For example, if For example for confirmed response or BOR in the oncology setting or
confirmed deterioration in questionnaires the confirmatory assessment must
be after the assessment. Thus Whereas, sometimes you might allow for confirmatory observations to occur
prior to the observation. For example, to identify AEs occurring on or
after seven days before a COVID AE. Thus Permitted Values: |
filter_add |
Filter for additional dataset ( Only observations from Variables created by The condition can include summary functions like Permitted Values: a condition |
first_cond_lower |
Condition for selecting range of data (before) If this argument is specified, the other observations are restricted from the first observation before the current observation where the specified condition is fulfilled up to the current observation. If the condition is not fulfilled for any of the other observations, no observations are considered. This argument should be specified if |
first_cond_upper |
Condition for selecting range of data (after) If this argument is specified, the other observations are restricted up to the first observation where the specified condition is fulfilled. If the condition is not fulfilled for any of the other observations, no observations are considered. This argument should be specified if |
filter_join |
Filter for the joined dataset The specified condition is applied to the joined dataset. Therefore
variables from both datasets Variables created by The condition can include summary functions like Permitted Values: a condition |
mode |
Selection mode Determines if the first or last observation is selected. If the If the Permitted Values: |
exist_flag |
Exist flag If the argument is specified (e.g., Permitted Values: Variable name |
true_value |
True value The value for the specified variable Permitted Values: An atomic scalar |
false_value |
False value The value for the specified variable Permitted Values: An atomic scalar |
missing_values |
Values for non-matching observations For observations of the input dataset ( Permitted Values: named list of expressions, e.g.,
|
check_type |
Check uniqueness? If This argument is ignored if Permitted Values: |
Details
The variables specified by
order
are added to the additional dataset (dataset_add
).The variables specified by
join_vars
are added to the additional dataset (dataset_add
).The records from the additional dataset (
dataset_add
) are restricted to those matching thefilter_add
condition.The input dataset and the (restricted) additional dataset are left joined by the grouping variables (
by_vars
). If no grouping variables are specified, a full join is performed.If
first_cond_lower
is specified, for each observation of the input dataset the joined dataset is restricted to observations from the first observation wherefirst_cond_lower
is fulfilled (the observation fulfilling the condition is included) up to the observation of the input dataset. If for an observation of the input dataset the condition is not fulfilled, the observation is removed.If
first_cond_upper
is specified, for each observation of the input dataset the joined dataset is restricted to observations up to the first observation wherefirst_cond_upper
is fulfilled (the observation fulfilling the condition is included). If for an observation of the input dataset the condition is not fulfilled, the observation is removed.For an example see the last example in the "Examples" section.
The joined dataset is restricted by the
filter_join
condition.If
order
is specified, for each observation of the input dataset the first or last observation (depending onmode
) is selected.The variables specified for
new_vars
are created (if requested) and merged to the input dataset. I.e., the output dataset contains all observations from the input dataset. For observations without a matching observation in the joined dataset the new variables are set as specified bymissing_values
(or toNA
for variables not inmissing_values
). Observations in the additional dataset which have no matching observation in the input dataset are ignored.
Value
The output dataset contains all observations and variables of the
input dataset and additionally the variables specified for new_vars
from
the additional dataset (dataset_add
).
See Also
derive_var_joined_exist_flag()
, filter_joined()
General Derivation Functions for all ADaMs that returns variable appended to dataset:
derive_var_extreme_flag()
,
derive_var_joined_exist_flag()
,
derive_var_merged_ef_msrc()
,
derive_var_merged_exist_flag()
,
derive_var_merged_summary()
,
derive_var_obs_number()
,
derive_var_relative_flag()
,
derive_vars_computed()
,
derive_vars_merged()
,
derive_vars_merged_lookup()
,
derive_vars_transposed()
Examples
library(tibble)
library(lubridate)
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
# Add AVISIT (based on time windows), AWLO, and AWHI
adbds <- tribble(
~USUBJID, ~ADY,
"1", -33,
"1", -2,
"1", 3,
"1", 24,
"2", NA,
)
windows <- tribble(
~AVISIT, ~AWLO, ~AWHI,
"BASELINE", -30, 1,
"WEEK 1", 2, 7,
"WEEK 2", 8, 15,
"WEEK 3", 16, 22,
"WEEK 4", 23, 30
)
derive_vars_joined(
adbds,
dataset_add = windows,
join_type = "all",
filter_join = AWLO <= ADY & ADY <= AWHI
)
# derive the nadir after baseline and before the current observation
adbds <- tribble(
~USUBJID, ~ADY, ~AVAL,
"1", -7, 10,
"1", 1, 12,
"1", 8, 11,
"1", 15, 9,
"1", 20, 14,
"1", 24, 12,
"2", 13, 8
)
derive_vars_joined(
adbds,
dataset_add = adbds,
by_vars = exprs(USUBJID),
order = exprs(AVAL),
new_vars = exprs(NADIR = AVAL),
join_vars = exprs(ADY),
join_type = "all",
filter_add = ADY > 0,
filter_join = ADY.join < ADY,
mode = "first",
check_type = "none"
)
# add highest hemoglobin value within two weeks before AE,
# take earliest if more than one
adae <- tribble(
~USUBJID, ~ASTDY,
"1", 3,
"1", 22,
"2", 2
)
adlb <- tribble(
~USUBJID, ~PARAMCD, ~ADY, ~AVAL,
"1", "HGB", 1, 8.5,
"1", "HGB", 3, 7.9,
"1", "HGB", 5, 8.9,
"1", "HGB", 8, 8.0,
"1", "HGB", 9, 8.0,
"1", "HGB", 16, 7.4,
"1", "HGB", 24, 8.1,
"1", "ALB", 1, 42,
)
derive_vars_joined(
adae,
dataset_add = adlb,
by_vars = exprs(USUBJID),
order = exprs(AVAL, desc(ADY)),
new_vars = exprs(HGB_MAX = AVAL, HGB_DY = ADY),
join_type = "all",
filter_add = PARAMCD == "HGB",
filter_join = ASTDY - 14 <= ADY & ADY <= ASTDY,
mode = "last"
)
# Add APERIOD, APERIODC based on ADSL
adsl <- tribble(
~USUBJID, ~AP01SDT, ~AP01EDT, ~AP02SDT, ~AP02EDT,
"1", "2021-01-04", "2021-02-06", "2021-02-07", "2021-03-07",
"2", "2021-02-02", "2021-03-02", "2021-03-03", "2021-04-01"
) %>%
mutate(across(ends_with("DT"), ymd)) %>%
mutate(STUDYID = "xyz")
period_ref <- create_period_dataset(
adsl,
new_vars = exprs(APERSDT = APxxSDT, APEREDT = APxxEDT)
)
period_ref
adae <- tribble(
~USUBJID, ~ASTDT,
"1", "2021-01-01",
"1", "2021-01-05",
"1", "2021-02-05",
"1", "2021-03-05",
"1", "2021-04-05",
"2", "2021-02-15",
) %>%
mutate(
ASTDT = ymd(ASTDT),
STUDYID = "xyz"
)
derive_vars_joined(
adae,
dataset_add = period_ref,
by_vars = exprs(STUDYID, USUBJID),
join_vars = exprs(APERSDT, APEREDT),
join_type = "all",
filter_join = APERSDT <= ASTDT & ASTDT <= APEREDT
)
# Add day since last dose (LDRELD)
adae <- tribble(
~USUBJID, ~ASTDT, ~AESEQ,
"1", "2020-02-02", 1,
"1", "2020-02-04", 2
) %>%
mutate(ASTDT = ymd(ASTDT))
ex <- tribble(
~USUBJID, ~EXSDTC,
"1", "2020-01-10",
"1", "2020-01",
"1", "2020-01-20",
"1", "2020-02-03"
)
## Please note that EXSDT is created via the order argument and then used
## for new_vars, filter_add, and filter_join
derive_vars_joined(
adae,
dataset_add = ex,
by_vars = exprs(USUBJID),
order = exprs(EXSDT = convert_dtc_to_dt(EXSDTC)),
join_type = "all",
new_vars = exprs(LDRELD = compute_duration(
start_date = EXSDT, end_date = ASTDT
)),
filter_add = !is.na(EXSDT),
filter_join = EXSDT <= ASTDT,
mode = "last"
)
# first_cond_lower and first_cond_upper argument
myd <- tribble(
~subj, ~day, ~val,
"1", 1, "++",
"1", 2, "-",
"1", 3, "0",
"1", 4, "+",
"1", 5, "++",
"1", 6, "-",
"2", 1, "-",
"2", 2, "++",
"2", 3, "+",
"2", 4, "0",
"2", 5, "-",
"2", 6, "++"
)
# derive last "++" day before "0" where all results in between are "+" or "++"
derive_vars_joined(
myd,
dataset_add = myd,
by_vars = exprs(subj),
order = exprs(day),
mode = "first",
new_vars = exprs(prev_plus_day = day),
join_vars = exprs(val),
join_type = "before",
first_cond_lower = val.join == "++",
filter_join = val == "0" & all(val.join %in% c("+", "++"))
)
# derive first "++" day after "0" where all results in between are "+" or "++"
derive_vars_joined(
myd,
dataset_add = myd,
by_vars = exprs(subj),
order = exprs(day),
mode = "last",
new_vars = exprs(next_plus_day = day),
join_vars = exprs(val),
join_type = "after",
first_cond_upper = val.join == "++",
filter_join = val == "0" & all(val.join %in% c("+", "++"))
)