get_prioritised_covariates {autoCovariateSelection}  R Documentation 
get_prioritised_covariates
function assesses the recurrence of each of the identified candidate empirical covariates
based on their frequency of occurrence for each patient in the baseline period and generates three binary recurrence covariates
for each of the identified candidate empirical covariates. This is the third and final step in the automated covariate selection process.
The previous step of assessing recurrence and generating the binary recurrence covariates is done
using the get_recurrence_covariates
function.
See 'Automated Covariate Selection'section below for more details regarding the overall process.
get_prioritised_covariates( df, patientIdVarname, exposureVector, outcomeVector, patientIdVector, k = 500 )
df 
The input 
patientIdVarname 
The variable name which contains the patient identifier in the 
exposureVector 
The 1D exposure (treatment/intervention) vector. The length of this vector should be equal to that of

outcomeVector 
The 1D outcome vector indicating whether or not the patient experienced the outcome of interest (value = 1) or not (value =0).
The length of this vector should be equal to that of 
patientIdVector 
The 1D vector with all the patient identifiers. This should contain all the patient IDs in the original two
cohorts with its length and order equal to and resonating with that of 
k 
The maximum number of prioritised covariates that should be returned by the function. By default, this is 500 as described in the original paper 
To prioritise covariates across data dimensions (domains) should be assessed by their potential for controlling confounding that is not conditional
on exposure and other covariates. This means that the association of the covariates with the outcomes (relative risk) should be taken into
consideration for quantifying the 'potential' for confounding. Relative risk weighted by the ratio of prevalence of the covariates between the
two exposure groups is known as multiplicative bias. The other way to do this would be to use the absolute risk and this would have been the rather
straightforward procedure to quantify the potential for confounding. However, this method would invariably downweight the association between the
covariate and the outcome if the outcome prevalence is small and the exposure prevalence is high which is a common phenomenon seen with comparative
effective research using realworlddata by retrospective cohort studies. The multiplicative bias term balances this and generates a quantity for each
covariate that is reflective of its confounding potential. By ranking the multiplicative bias, the objective is to choose the top k
number of
covariates from this procedure. k
, by default, is 500 as described in the original paper. For further theoretical details of the
algorithm please refer to the original article listed below in the References
section. get_recurrence_covariates
is the function
implementing what is described in the 'Prioritise Covariates' section of the article.
A named list containing two R objects
autoselected_covariate_df
A data.frame
in wide format containing the autoselected prioritised covariates and their values (1 or 0)
for each patients
multiplicative_bias
The absolute log of the multiplicative bias term for each of the autoselected prioritised covariates
The three steps in automated covariate selection are listed below with the functions implementing the methodology
Identify candidate empirical covariates: get_candidate_covariates
Assess recurrence: get_recurrence_covariates
Prioritize covariates: get_prioritised_covariates
Dennis Robert dennis.robert.nm@gmail.com
Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. Highdimensional propensity score adjustment in studies of treatment effects using health care claims data Epidemiology. 2009;20(4):512522. doi:10.1097/EDE.0b013e3181a663cc
library("autoCovariateSelection") data(rwd) head(rwd, 3) basetable < rwd %>% select(person_id, treatment, outcome_date) %>% distinct() head(basetable, 3) patientIds < basetable$person_id step1 < get_candidate_covariates(df = rwd, domainVarname = "domain", eventCodeVarname = "event_code" , patientIdVarname = "person_id", patientIdVector = patientIds,n = 100, min_num_patients = 10) out1 < step1$covars_data all.equal(patientIds, step1$patientIds) #should be TRUE step2 < get_recurrence_covariates(df = out1, patientIdVarname = "person_id", eventCodeVarname = "event_code", patientIdVector = patientIds) out2 < step2$recurrence_data out3 < get_prioritised_covariates(df = out2, patientIdVarname = "person_id", exposureVector = basetable$treatment, outcomeVector = ifelse(is.na(basetable$outcome_date), 0,1), patientIdVector = patientIds, k = 10)