R: Generate the binary recurrence covariates for the identified...

get_recurrence_covariates {autoCovariateSelection}

R Documentation

Generate the binary recurrence covariates for the identified candidate empirical covariates

Description

get_recurrence_covariates function assesses the recurrence of each of the identified candidate empirical covariates based on their frequency of occurrence for each patient in the baseline period and generates three binary recurrence covariates for each of the identified candidate empirical covariates. This is the second step in the automated covariate selection process. The first step of identifying empirical candidate covariates is done via get_candidate_covariates function. See 'Automated Covariate Selection'section below for more details regarding the overall process.

Usage

get_recurrence_covariates(
  df,
  patientIdVarname,
  eventCodeVarname,
  patientIdVector
)

Arguments

`df`	The input `data.frame`. Ideally this should be the output `covars_data` from `get_candidate_covariates`
`patientIdVarname`	The variable name which contains the patient identifier in the `df`
`eventCodeVarname`	The variable name which contains the covariate codes (eg:- CCS, ICD9) in the `df`
`patientIdVector`	The 1-D vector with all the patient identifiers. This should contain all the patient IDs in the original two cohorts. This vector can simply be the `patientIds` output vector of the `get_candidate_covariates` function. of the function

Details

The recurrence covariates are generated based on the frequency (counts) of occurrence of each empirical candidate covariates that got generated by the generate_candidate_covariates function. This is done by looking at the baseline period of each patients and assessing whether the covariate occurred only once or sporadically or frequently. That is, a maximum of three recurrence covariates for each candidate covariate is created and returned.

once Indicates whether or not the covariate occurred more than or equal to 1 number of times for the patient
sporadic Indicates whether or not the covariate occurred more than or equal to median (median of non-zero occurrences of the candidate covariate) number of times for the patient.
frequent Indicates whether or not the covariate occurred more than or equal to upper quartile (75th percentile of non-zero occurrences of the candidate covariate) number of times for the patient

Note that if two or all three covariates are identical for any of the binary recurrence covariates, only the distinct recurrence covariate is returned. For example, if once == sporadic == frequent for the candidate covariate (median and upper quartile both are 1), then only the 'once' recurrence covariate is returned. If once != sporadic == frequent, then 'once' and 'sporadic' is returned. If once == sporadic != frequent, then 'once' and 'frequent' are returned. If none of three recurrence covariates are identical, then all three are returned. The theoretical details of the algorithm implemented is detailed in the publication listed below in the References section. get_recurrence_covariates is the function implementing what is described in the 'Assess Recurrence' section of the article.

Value

A named list containing two R objects

recurrence_data A data.frame containing all the binary recurrence covariates for all the patients in wide format. This means that this data.frame will have a dimension with number of rows equal to number of distinct patients and number of columns equal to number of binary recurrence covariates plus 1 (for the patient Id variable). The binary recurrence covariate is prefixed with a 'rec_' to indicate that the covariate is a 'reccurrence covariate' and suffixed with '_once', '_sporadic' or '_frequent'. See details section above for details.
patientIds The list of patient ids present in the original input df. This is exactly the same as the input patientIdVector

Automated Covariate Selection

The three steps in automated covariate selection are listed below with the functions implementing the methodology

Identify candidate empirical covariates: get_candidate_covariates
Assess recurrence: get_recurrence_covariates
Prioritize covariates: get_prioritised_covariates

Author(s)

Dennis Robert dennis.robert.nm@gmail.com

References

Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data Epidemiology. 2009;20(4):512-522. doi:10.1097/EDE.0b013e3181a663cc

Examples

library("autoCovariateSelection")
data(rwd)
head(rwd, 3)
basetable <- rwd %>% select(person_id, treatment, outcome_date) %>% distinct()
head(basetable, 3)
patientIds <- basetable$person_id
step1 <- get_candidate_covariates(df = rwd,  domainVarname = "domain",
eventCodeVarname = "event_code" , patientIdVarname = "person_id",
patientIdVector = patientIds,n = 100, min_num_patients = 10)
out1 <- step1$covars_data
all.equal(patientIds, step1$patientIds) #should return  TRUE
step2 <- get_recurrence_covariates(df = out1, patientIdVarname = "person_id",
eventCodeVarname = "event_code", patientIdVector = patientIds)
out2 <- step2$recurrence_data

[Package autoCovariateSelection version 1.0.0 Index]