get_recurrence_covariates {autoCovariateSelection} | R Documentation |
Generate the binary recurrence covariates for the identified candidate empirical covariates
Description
get_recurrence_covariates
function assesses the recurrence of each of the identified candidate empirical covariates
based on their frequency of occurrence for each patient in the baseline period and generates three binary recurrence covariates
for each of the identified candidate empirical covariates. This is the second step in the automated covariate selection process.
The first step of identifying empirical candidate covariates is done via get_candidate_covariates
function.
See 'Automated Covariate Selection'section below for more details regarding the overall process.
Usage
get_recurrence_covariates(
df,
patientIdVarname,
eventCodeVarname,
patientIdVector
)
Arguments
df |
The input |
patientIdVarname |
The variable name which contains the patient identifier in the |
eventCodeVarname |
The variable name which contains the covariate codes (eg:- CCS, ICD9) in the |
patientIdVector |
The 1-D vector with all the patient identifiers. This should contain all the patient IDs in the original two
cohorts. This vector can simply be the |
Details
The recurrence covariates are generated based on the frequency (counts) of occurrence of each empirical candidate covariates that got
generated by the generate_candidate_covariates
function. This is done by looking at the baseline period of each patients and
assessing whether the covariate occurred only once or sporadically or frequently. That is, a maximum of three recurrence covariates
for each candidate covariate is created and returned.
-
once
Indicates whether or not the covariate occurred more than or equal to 1 number of times for the patient -
sporadic
Indicates whether or not the covariate occurred more than or equal to median (median of non-zero occurrences of the candidate covariate) number of times for the patient. -
frequent
Indicates whether or not the covariate occurred more than or equal to upper quartile (75th percentile of non-zero occurrences of the candidate covariate) number of times for the patient
Note that if two or all three covariates are identical for any of the binary recurrence covariates, only the distinct recurrence covariate
is returned. For example, if once == sporadic == frequent for the candidate covariate (median and upper quartile both are 1), then only the 'once' recurrence covariate is
returned. If once != sporadic == frequent, then 'once' and 'sporadic' is returned. If once == sporadic != frequent, then 'once'
and 'frequent' are returned. If none of three recurrence covariates are identical, then all three are returned.
The theoretical details of the algorithm implemented is detailed in the publication listed below in the References
section.
get_recurrence_covariates
is the function implementing what is described in the 'Assess Recurrence' section
of the article.
Value
A named list containing two R objects
-
recurrence_data
Adata.frame
containing all the binary recurrence covariates for all the patients in wide format. This means that thisdata.frame
will have a dimension with number of rows equal to number of distinct patients and number of columns equal to number of binary recurrence covariates plus 1 (for the patient Id variable). The binary recurrence covariate is prefixed with a 'rec_' to indicate that the covariate is a 'reccurrence covariate' and suffixed with '_once', '_sporadic' or '_frequent'. Seedetails
section above for details. -
patientIds
The list of patient ids present in the original inputdf
. This is exactly the same as the inputpatientIdVector
Automated Covariate Selection
The three steps in automated covariate selection are listed below with the functions implementing the methodology
Identify candidate empirical covariates:
get_candidate_covariates
Assess recurrence:
get_recurrence_covariates
Prioritize covariates:
get_prioritised_covariates
Author(s)
Dennis Robert dennis.robert.nm@gmail.com
References
Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data Epidemiology. 2009;20(4):512-522. doi:10.1097/EDE.0b013e3181a663cc
Examples
library("autoCovariateSelection")
data(rwd)
head(rwd, 3)
basetable <- rwd %>% select(person_id, treatment, outcome_date) %>% distinct()
head(basetable, 3)
patientIds <- basetable$person_id
step1 <- get_candidate_covariates(df = rwd, domainVarname = "domain",
eventCodeVarname = "event_code" , patientIdVarname = "person_id",
patientIdVector = patientIds,n = 100, min_num_patients = 10)
out1 <- step1$covars_data
all.equal(patientIds, step1$patientIds) #should return TRUE
step2 <- get_recurrence_covariates(df = out1, patientIdVarname = "person_id",
eventCodeVarname = "event_code", patientIdVector = patientIds)
out2 <- step2$recurrence_data