CompareRulesOnValidation {DevTreatRules} | R Documentation |
Build treatment rules on a development dataset and evaluate performance on an independent validation dataset
Description
In many practical settings, BuildRule()
has limited utility because it requires the specification of a single value in its prediction.approach
argument (even if there is no prior knowledge about which of the split-regression, OWL framework, and direct-interactions approaches will perform best) and a single value for the 'propensity.score' and 'rule.method' arguments (even if there is no prior knowledge about whether standard or penalized GLM will perform best). CompareRulesOnValidation()
supports model selection in these settings by essentially looping over calls to BuildRule()
for different combinations of split-regression/OWL framework/direct-interactions and standard/lasso/ridge regression to simultaneously build the rules on a development dataset and evaluate them on an independent validation dataset.
Usage
CompareRulesOnValidation(
development.data,
validation.data,
vec.approaches = c("split.regression", "OWL.framework", "direct.interactions"),
vec.rule.methods = c("glm.regression", "lasso", "ridge"),
vec.propensity.methods = "logistic.regression",
study.design.development,
name.outcome.development,
type.outcome.development,
name.treatment.development,
names.influencing.treatment.development,
names.influencing.rule.development,
desirable.outcome.development,
additional.weights.development = rep(1, nrow(development.data)),
study.design.validation = study.design.development,
name.outcome.validation = name.outcome.development,
type.outcome.validation = type.outcome.development,
name.treatment.validation = name.treatment.development,
names.influencing.treatment.validation = names.influencing.treatment.development,
names.influencing.rule.validation = names.influencing.rule.development,
desirable.outcome.validation = desirable.outcome.development,
clinical.threshold.validation = 0,
propensity.method.validation = "logistic.regression",
additional.weights.validation = rep(1, nrow(validation.data)),
truncate.propensity.score = TRUE,
truncate.propensity.score.threshold = 0.05,
type.observation.weights = NULL,
propensity.k.cv.folds = 10,
rule.k.cv.folds = 10,
lambda.choice = c("min", "1se"),
OWL.lambda.seq = NULL,
OWL.kernel = "linear",
OWL.kparam.seq = NULL,
OWL.cvFolds = 10,
OWL.verbose = TRUE,
OWL.framework.shift.by.min = TRUE,
direct.interactions.center.continuous.Y = TRUE,
direct.interactions.exclude.A.from.penalty = TRUE,
bootstrap.CI = FALSE,
bootstrap.CI.replications = 100
)
Arguments
development.data |
A data frame representing the *development* dataset used to build treatment rules. |
validation.data |
A data frame representing the independent *validation* dataset used to estimate the performance of treatment rules built on the development dataset. |
vec.approaches |
A character vector (or element) indicating the values of the |
vec.rule.methods |
A character vector (or element) indicating the values of the |
vec.propensity.methods |
A character vector (or element) indicating the values of |
study.design.development |
Either ‘observational’, ‘RCT’, or ‘naive’, representing the study design on the development dataset. For the |
name.outcome.development |
A character indicating the name of the outcome variable in |
type.outcome.development |
Either ‘binary’ or ‘continuous’, the form of |
name.treatment.development |
A character indicating the name of the treatment variable in |
names.influencing.treatment.development |
A character vector (or element) indicating the names of the variables in |
names.influencing.rule.development |
A character vector (or element) indicating the names of the variables in |
desirable.outcome.development |
A logical equal to |
additional.weights.development |
A numeric vector of observation weights that will be multiplied by IPW weights in the rule development stage, with length equal to the number of rows in |
study.design.validation |
Either ‘observational’, ‘RCT’, or ‘naive’,representing the study design on the development dataset. Default is the value of |
name.outcome.validation |
A character indicating the name of the outcome variable in |
type.outcome.validation |
Either ‘binary’ or ‘continuous’, the form of |
name.treatment.validation |
A character indicating the name of the treatment variable in |
names.influencing.treatment.validation |
A character vector (or element) indicating the names of the variables in |
names.influencing.rule.validation |
A character vector (or element) indicating the names of the variables in |
desirable.outcome.validation |
A logical equal to |
clinical.threshold.validation |
A numeric equal to a positive number above which the predicted outcome under treatment must be superior to the predicted outcome under control for treatment to be recommended. Only used when |
propensity.method.validation |
One of ‘logistic.regression’, ‘lasso’, or ‘ridge’. This is the underlying regression model used to estimate propensity scores (for |
additional.weights.validation |
A numeric vector of observation weights that will be multiplied by IPW weights in the rule evaluation stage, with length equal to the number of rows in |
truncate.propensity.score |
A logical variable dictating whether estimated propensity scores less than |
truncate.propensity.score.threshold |
A numeric value between 0 and 0.25. |
type.observation.weights |
Default is NULL, but other choices are ‘IPW.L’, ‘IPW.L.and.X’, and ‘IPW.ratio’, where L indicates the |
propensity.k.cv.folds |
An integer specifying how many folds to use for K-fold cross-validation that chooses the tuning parameter when |
rule.k.cv.folds |
An integer specifying how many folds to use for K-fold cross-validation that chooses the tuning parameter when |
lambda.choice |
Either ‘min’ or ‘1se’, corresponding to the |
OWL.lambda.seq |
Used when |
OWL.kernel |
Used when |
OWL.kparam.seq |
Used when |
OWL.cvFolds |
Used when |
OWL.verbose |
Used when |
OWL.framework.shift.by.min |
Logical, set to |
direct.interactions.center.continuous.Y |
Logical, set to |
direct.interactions.exclude.A.from.penalty |
Logical, set to |
bootstrap.CI |
Logical indicating whether the ATE/ABR estimates on the validation set should be accompanied by 95% confidence intervals based on the bootstrap. Default is |
bootstrap.CI.replications |
An integer specifying how many bootstrap replications should underlie the computed CIs. Default is 1000. |
Value
A list with components:
-
list.summaries
: A list with number of elements equal to the length ofvec.approaches
. Each element is a matrix that, for a given prediction approach, shows estimated rule performance with 5 columns ifbootstrap.CI=FALSE
(number of test-positives, number of test-negatives, ATE in test-positives, ATE in test-negatives, ABR) for the different combinations ofvec.rule.methods
or 9 columns ifbootstrap.CI=TRUE
(those same 5 summaries plus the bounds for 95% CIs for ATE in test-positives and ATE in test-negatives) and, in the rows, thevec.propensity.methods
in addition to the two naive rules (treating all observations and treating no observations). -
list.rules
: A list with number of elements equal to the length ofvec.approaches
. Each element is another list that, for a given prediction approach, stores the object returned byBuildRule()
for the different combinations ofvec.rule.methods
andvec.propensity.methods
in the rows.
Examples
set.seed(123)
example.split <- SplitData(data=obsStudyGeneExpressions,
n.sets=3, split.proportions=c(0.5, 0.25, 0.25))
development.data <- example.split[example.split$partition == "development", ]
validation.data <- example.split[example.split$partition == "validation", ]
model.selection <- CompareRulesOnValidation(development.data=development.data,
validation.data=validation.data,
study.design.development="observational",
vec.approaches=c("split.regression", "OWL.framework", "direct.interactions"),
vec.rule.methods=c("glm.regression", "lasso"),
vec.propensity.methods="logistic.regression",
name.outcome.development="no_relapse",
type.outcome.development="binary",
name.treatment.development="intervention",
names.influencing.treatment.development=c("prognosis", "clinic", "age"),
names.influencing.rule.development=c("age", paste0("gene_", 1:10)),
desirable.outcome.development=TRUE)
model.selection$list.summaries$split.regression