get_candidate_covariates {autoCovariateSelection} | R Documentation |
Generate candidate empirical baseline covariates based on prevalence in the baseline period
Description
get_candidate_covariates
function generates the list of candidate empirical covariates based on their prevalence
within each domains (dimensions). This is the first step in the automated covariate selection process. See 'Automated Covariate Selection'
section below for more details regarding the overall process.
Usage
get_candidate_covariates(
df,
domainVarname,
eventCodeVarname,
patientIdVarname,
patientIdVector,
n = 200,
min_num_patients = 100
)
Arguments
df |
The input |
domainVarname |
The variable(field) name which contains the domain of the covariate in the |
eventCodeVarname |
The variable name which contains the covariate codes (eg:- CCS, ICD9) in the |
patientIdVarname |
The variable name which contains the patient identifier in the |
patientIdVector |
The 1-D vector with all the patient identifiers. The length of this vector should be equal to
the number of distinct patients in the |
n |
The maximum number of empirical candidate baseline covariates that should be returned within each domain. By default, n is 200 |
min_num_patients |
Minimum number of patients that should be present for each covariate to be selected for selection.
To be considered for selection, a covariate should have occurred for a minimum |
Details
The theoretical details of the high-dimensional propensity score (HDPS) algorithm is detailed in the publication listed below in the References
section.
get_candidate_covariates
is the function implementing what is described in the 'Identify candidate empirical covariates' section
of the article.
Value
A named list containing three R objects
-
covars
A 1-D vector containing the names of selected baseline covariate names from each domain. For each domain in thedf
, the number ofcovars
would be equal to or less thann
-
covars_data
Thedata.frame
that is filtered out ofdf
with only the selectedcovars
. The values of theeventCodeVarname
field is prefixed with the correspondingdomain
name. For example, if the event code is 19900 and the domain is 'dx', then the the covariate name will be 'dx_19900'. -
patientIds
The list of patient ids present in the original inputdf
. This is exactly the same as the inputpatientIdVector
Automated Covariate Selection
The three steps in automated covariate selection are listed below with the functions implementing the methodology
Identify candidate empirical covariates:
get_candidate_covariates
Assess recurrence:
get_recurrence_covariates
Prioritize covariates:
get_prioritised_covariates
Author(s)
Dennis Robert dennis.robert.nm@gmail.com
References
Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data Epidemiology. 2009;20(4):512-522. doi:10.1097/EDE.0b013e3181a663cc
Examples
library("autoCovariateSelection")
data(rwd)
head(rwd, 3)
#select distinct elements that are unique for each patient - treatment and outcome
basetable <- rwd %>% select(person_id, treatment, outcome_date) %>% distinct()
head(basetable, 3)
patientIds <- basetable$person_id
step1 <- get_candidate_covariates(df = rwd, domainVarname = "domain",
eventCodeVarname = "event_code", patientIdVarname = "person_id",
patientIdVector = patientIds,n = 100, min_num_patients = 10)
out1 <- step1$covars_data #this will be input to get_recurrence_covariates() function