get_recurrence_covariates {autoCovariateSelection}R Documentation

Generate the binary recurrence covariates for the identified candidate empirical covariates


get_recurrence_covariates function assesses the recurrence of each of the identified candidate empirical covariates based on their frequency of occurrence for each patient in the baseline period and generates three binary recurrence covariates for each of the identified candidate empirical covariates. This is the second step in the automated covariate selection process. The first step of identifying empirical candidate covariates is done via get_candidate_covariates function. See 'Automated Covariate Selection'section below for more details regarding the overall process.





The input data.frame. Ideally this should be the output covars_data from get_candidate_covariates


The variable name which contains the patient identifier in the df


The variable name which contains the covariate codes (eg:- CCS, ICD9) in the df


The 1-D vector with all the patient identifiers. This should contain all the patient IDs in the original two cohorts. This vector can simply be the patientIds output vector of the get_candidate_covariates function. of the function


The recurrence covariates are generated based on the frequency (counts) of occurrence of each empirical candidate covariates that got generated by the generate_candidate_covariates function. This is done by looking at the baseline period of each patients and assessing whether the covariate occurred only once or sporadically or frequently. That is, a maximum of three recurrence covariates for each candidate covariate is created and returned.

Note that if two or all three covariates are identical for any of the binary recurrence covariates, only the distinct recurrence covariate is returned. For example, if once == sporadic == frequent for the candidate covariate (median and upper quartile both are 1), then only the 'once' recurrence covariate is returned. If once != sporadic == frequent, then 'once' and 'sporadic' is returned. If once == sporadic != frequent, then 'once' and 'frequent' are returned. If none of three recurrence covariates are identical, then all three are returned. The theoretical details of the algorithm implemented is detailed in the publication listed below in the References section. get_recurrence_covariates is the function implementing what is described in the 'Assess Recurrence' section of the article.


A named list containing two R objects

Automated Covariate Selection

The three steps in automated covariate selection are listed below with the functions implementing the methodology

  1. Identify candidate empirical covariates: get_candidate_covariates

  2. Assess recurrence: get_recurrence_covariates

  3. Prioritize covariates: get_prioritised_covariates


Dennis Robert


Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data Epidemiology. 2009;20(4):512-522. doi:10.1097/EDE.0b013e3181a663cc


head(rwd, 3)
basetable <- rwd %>% select(person_id, treatment, outcome_date) %>% distinct()
head(basetable, 3)
patientIds <- basetable$person_id
step1 <- get_candidate_covariates(df = rwd,  domainVarname = "domain",
eventCodeVarname = "event_code" , patientIdVarname = "person_id",
patientIdVector = patientIds,n = 100, min_num_patients = 10)
out1 <- step1$covars_data
all.equal(patientIds, step1$patientIds) #should return  TRUE
step2 <- get_recurrence_covariates(df = out1, patientIdVarname = "person_id",
eventCodeVarname = "event_code", patientIdVector = patientIds)
out2 <- step2$recurrence_data

[Package autoCovariateSelection version 1.0.0 Index]