train_phevis {PheVis} | R Documentation |
train_phevis
Description
Global function to train phevis model.
Usage
train_phevis(
half_life,
df,
START_DATE,
PATIENT_NUM,
ENCOUNTER_NUM,
var_vec,
main_icd,
main_cui,
rf = TRUE,
p.noise = 0.3,
bool_SAFE = TRUE,
omega = 2,
GS = NULL
)
Arguments
half_life |
Duration of cumulation. For a chronic disease you might chose Inf, for acute disease you might chose the duration of the disease. |
df |
|
START_DATE |
Column name of the time column. The time column should be numeric |
PATIENT_NUM |
Column name of the patient id column. |
ENCOUNTER_NUM |
Column name of the encounter id column. |
var_vec |
Explanatory variables used for the prediction, including the main variables. |
main_icd |
Character vector of the column names of the main ICD codes. |
main_cui |
Character vector of the column names of the main CUIs. |
rf |
should pseudo-labellisation with random forest be used (default is true) |
p.noise |
percentage of noise introduced during the noising step (default is 0.3) |
bool_SAFE |
A boolean. If TRUE, SAFE selection is done, else it is not (default is TRUE) |
omega |
Constant for the extrema population definition (default is 2) |
GS |
Character string corresponding to the name of the gold-standard variable (default is null for which a vector of 0 will be taken). |
Value
A list
surparam - the parameters used to compute the surrogate
model - the random intercept logistic regression
df_train_result - the
data.frame
containing the output predictionstrain_param - parameters for the model training (variables used, main ICD and CUIS, half_life, gold standard)
Examples
library(dplyr)
PheVis::data_phevis
df <- data_phevis %>%
mutate(ENCOUNTER_NUM = row_number(),
time = round(as.numeric(time)))
model <- PheVis::train_phevis(half_life = Inf,
df = df,
START_DATE = "time",
PATIENT_NUM = "subject",
ENCOUNTER_NUM = "ENCOUNTER_NUM",
var_vec = c(paste0("var",1:10), "mainCUI", "mainICD"),
main_icd = "mainICD",
main_cui = "mainCUI")