R: Build treatment rules on a development dataset and evaluate...

CompareRulesOnValidation {DevTreatRules}

R Documentation

Build treatment rules on a development dataset and evaluate performance on an independent validation dataset

Description

In many practical settings, BuildRule() has limited utility because it requires the specification of a single value in its prediction.approach argument (even if there is no prior knowledge about which of the split-regression, OWL framework, and direct-interactions approaches will perform best) and a single value for the 'propensity.score' and 'rule.method' arguments (even if there is no prior knowledge about whether standard or penalized GLM will perform best). CompareRulesOnValidation() supports model selection in these settings by essentially looping over calls to BuildRule() for different combinations of split-regression/OWL framework/direct-interactions and standard/lasso/ridge regression to simultaneously build the rules on a development dataset and evaluate them on an independent validation dataset.

Usage

CompareRulesOnValidation(
  development.data,
  validation.data,
  vec.approaches = c("split.regression", "OWL.framework", "direct.interactions"),
  vec.rule.methods = c("glm.regression", "lasso", "ridge"),
  vec.propensity.methods = "logistic.regression",
  study.design.development,
  name.outcome.development,
  type.outcome.development,
  name.treatment.development,
  names.influencing.treatment.development,
  names.influencing.rule.development,
  desirable.outcome.development,
  additional.weights.development = rep(1, nrow(development.data)),
  study.design.validation = study.design.development,
  name.outcome.validation = name.outcome.development,
  type.outcome.validation = type.outcome.development,
  name.treatment.validation = name.treatment.development,
  names.influencing.treatment.validation = names.influencing.treatment.development,
  names.influencing.rule.validation = names.influencing.rule.development,
  desirable.outcome.validation = desirable.outcome.development,
  clinical.threshold.validation = 0,
  propensity.method.validation = "logistic.regression",
  additional.weights.validation = rep(1, nrow(validation.data)),
  truncate.propensity.score = TRUE,
  truncate.propensity.score.threshold = 0.05,
  type.observation.weights = NULL,
  propensity.k.cv.folds = 10,
  rule.k.cv.folds = 10,
  lambda.choice = c("min", "1se"),
  OWL.lambda.seq = NULL,
  OWL.kernel = "linear",
  OWL.kparam.seq = NULL,
  OWL.cvFolds = 10,
  OWL.verbose = TRUE,
  OWL.framework.shift.by.min = TRUE,
  direct.interactions.center.continuous.Y = TRUE,
  direct.interactions.exclude.A.from.penalty = TRUE,
  bootstrap.CI = FALSE,
  bootstrap.CI.replications = 100
)

Arguments

`development.data`	A data frame representing the development dataset used to build treatment rules.
`validation.data`	A data frame representing the independent validation dataset used to estimate the performance of treatment rules built on the development dataset.
`vec.approaches`	A character vector (or element) indicating the values of the `prediction.approach` to be used for building the rule with `BuildRule()`. Default is c(`split.regression', `OWL.framework', `direct.interactions').
`vec.rule.methods`	A character vector (or element) indicating the values of the `rule.method` to be used for building the rule with `BuildRule()`. Default is c(`glm.regression', `lasso', `ridge').
`vec.propensity.methods`	A character vector (or element) indicating the values of `propensity.method` to be used for building the rule with `Build.Rule()`. Default is ‘logistic.regression’ to allow for estimation of bootstrap-based CIs.
`study.design.development`	Either ‘observational’, ‘RCT’, or ‘naive’, representing the study design on the development dataset. For the `observational` design, the function will use inverse-probability-of-treatment observation weights (IPW) based on estimated propensity scores with predictors `names.influencing.treatment`; for the `RCT` design, the function will use IPW based on propensity scores equal to the observed sample proportions; for the `naive` design, all observation weights will be uniformly equal to 1.
`name.outcome.development`	A character indicating the name of the outcome variable in `development.data`.
`type.outcome.development`	Either ‘binary’ or ‘continuous’, the form of `name.outcome.development`.
`name.treatment.development`	A character indicating the name of the treatment variable in `development.data`.
`names.influencing.treatment.development`	A character vector (or element) indicating the names of the variables in `development.data` that are expected to influence treatment assignment in the current dataset. Required for `study.design.development=`‘observational’.
`names.influencing.rule.development`	A character vector (or element) indicating the names of the variables in `development.data` that may influence response to treatment and are expected to be observed in future clinical settings.
`desirable.outcome.development`	A logical equal to `TRUE` if higher values of the outcome on `development,data` are considered desirable (e.g. for a binary outcome, a 1 is more desirable than a 0). The `OWL.framework` and `OWL` prediction approaches require a desirable outcome.
`additional.weights.development`	A numeric vector of observation weights that will be multiplied by IPW weights in the rule development stage, with length equal to the number of rows in `development.data`. This can be used, for example, to account for a non-representative sampling design or an IPW adjustment for missingness. The default is a vector of 1s.
`study.design.validation`	Either ‘observational’, ‘RCT’, or ‘naive’,representing the study design on the development dataset. Default is the value of `study.design.development`.
`name.outcome.validation`	A character indicating the name of the outcome variable in `validation.data`. Default is the value of `name.outcome.development`.
`type.outcome.validation`	Either ‘binary’ or ‘continuous’, the form of `name.outcome.validation`. Default is the value of `type.outcome.development`.
`name.treatment.validation`	A character indicating the name of the treatment variable in `validation.data`. Default is the value of `name.treatment.development`
`names.influencing.treatment.validation`	A character vector (or element) indicating the names of the variables in `validation.data` that are expected to influence treatment assignment in `validation.data`. Required for Required for `study.design.validation=`‘observational’. Default is the value of `names.influencing.treatment.development`.
`names.influencing.rule.validation`	A character vector (or element) indicating the names of the variables in `validation.data` that may influence response to treatment and are expected to be observed in future clinical settings. Default is the value of `names.influencing.rule.development`
`desirable.outcome.validation`	A logical equal to `TRUE` if higher values of the outcome on `validation,data` are considered desirable (e.g. for a binary outcome, a 1 is more desirable than a 0). The `OWL.framework` and `OWL` prediction approaches require a desirable outcome. Default is the value of `desirable.outcome.development`
`clinical.threshold.validation`	A numeric equal to a positive number above which the predicted outcome under treatment must be superior to the predicted outcome under control for treatment to be recommended. Only used when `BuildRuleObject` was specified and derived from the split-regression or direct-interactions approach. Default is 0.
`propensity.method.validation`	One of ‘logistic.regression’, ‘lasso’, or ‘ridge’. This is the underlying regression model used to estimate propensity scores (for `study.design=`‘observational’ on `validation.data`. If `bootstrap.CI=TRUE`, then `propensity.method` must be ‘logistic.regression’. Default is ‘logistic.regression’ to allow for estimation of bootstrap-based CIs.
`additional.weights.validation`	A numeric vector of observation weights that will be multiplied by IPW weights in the rule evaluation stage, with length equal to the number of rows in `validation.data`. This can be used, for example, to account for a non-representative sampling design or an IPW adjustment for missingness. The default is a vector of 1s.
`truncate.propensity.score`	A logical variable dictating whether estimated propensity scores less than `truncate.propensity.score.threshold` away from 0 or 1 should be truncated to be `truncate.propensity.score.threshold` away from 0 or 1.
`truncate.propensity.score.threshold`	A numeric value between 0 and 0.25.
`type.observation.weights`	Default is NULL, but other choices are ‘IPW.L’, ‘IPW.L.and.X’, and ‘IPW.ratio’, where L indicates the `names.influencing.treatment` variables, X indicates the `names.influencing.rule` variables. The default behavior is to use the ‘IPW.ratio’ observation weights (propensity score based on X divided by propensity score based on L and X) for `prediction.approach=`‘split.regression’ and to use ‘IPW.L’ observation weights (inverse of propensity score based on L) for the ‘direct.interactions’, ‘OWL’, and ‘OWL.framework’ prediction approaches.
`propensity.k.cv.folds`	An integer specifying how many folds to use for K-fold cross-validation that chooses the tuning parameter when `propensity.method` is ‘lasso’ or ‘ridge’. Default is 10.
`rule.k.cv.folds`	An integer specifying how many folds to use for K-fold cross-validation that chooses the tuning parameter when `rule.method` is `lasso` or ‘ridge’. Default is 10.
`lambda.choice`	Either ‘min’ or ‘1se’, corresponding to the `s` argument in `predict.cv.glmnet()` from the `glmnet` package. Only used when `propensity.method` or `rule.method` is ‘lasso’ or ‘ridge’. Default is ‘min’.
`OWL.lambda.seq`	Used when `prediction.approach=`‘OWL’, a numeric vector that corresponds to the `lambdas` argument in the `owl()` function from the `DynTxRegime` package. Defaults to `2^seq(-5, 5, 1)`.
`OWL.kernel`	Used when `prediction.approach=`‘OWL’, a character equal to either ‘linear’ or ‘radial’. Corresponds to the `kernel` argument in the `owl()` function from the `DynTxRegime` package. Default is ‘linear’.
`OWL.kparam.seq`	Used when `prediction.approach=`‘OWL’ and `OWL.kernel=`‘radial’. Corresponds to the `kparam` argument in the `owl()` function from the `DynTxRegime` package. Defaults to `2^seq(-10, 10, 1)`.
`OWL.cvFolds`	Used when `prediction.approach=`‘OWL’, an integer corresponding to the `cvFolds` argument in the `owl()` function from the `DynTxRegime` package. Defaults to 10.
`OWL.verbose`	Used when `prediction.approach=`‘OWL’, a logical corresponding to the `verbose` argument in the `owl()` function from the `DynTxRegime` package. Defaults to `TRUE`.
`OWL.framework.shift.by.min`	Logical, set to `TRUE` by default in recognition of our empirical observation that, with a continuous outcome, OWL framework performs far better in simulation studies when the outcome was shifted to have a minimum of just above 0.
`direct.interactions.center.continuous.Y`	Logical, set to `TRUE` by default in recognition of our empirical observation that, with a continuous outcome, direct-interactions performed far better in simulation studies when the outcome was mean-centered.
`direct.interactions.exclude.A.from.penalty`	Logical, set to `TRUE` by default in recognition of our empirical observation that, with a continuous outcome and lasso/ridge used specified as the `rule.method`, direct-interactions performed far better in simulation studies when the coefficient corresponding to the treatment variable was excluded from the penalty function.
`bootstrap.CI`	Logical indicating whether the ATE/ABR estimates on the validation set should be accompanied by 95% confidence intervals based on the bootstrap. Default is `FALSE`.
`bootstrap.CI.replications`	An integer specifying how many bootstrap replications should underlie the computed CIs. Default is 1000.

Value

A list with components:

list.summaries: A list with number of elements equal to the length of vec.approaches. Each element is a matrix that, for a given prediction approach, shows estimated rule performance with 5 columns if bootstrap.CI=FALSE (number of test-positives, number of test-negatives, ATE in test-positives, ATE in test-negatives, ABR) for the different combinations of vec.rule.methods or 9 columns if bootstrap.CI=TRUE (those same 5 summaries plus the bounds for 95% CIs for ATE in test-positives and ATE in test-negatives) and, in the rows, the vec.propensity.methods in addition to the two naive rules (treating all observations and treating no observations).
list.rules: A list with number of elements equal to the length of vec.approaches. Each element is another list that, for a given prediction approach, stores the object returned by BuildRule() for the different combinations of vec.rule.methods and vec.propensity.methods in the rows.

Examples

set.seed(123)
example.split <- SplitData(data=obsStudyGeneExpressions,
                                    n.sets=3, split.proportions=c(0.5, 0.25, 0.25))
development.data <- example.split[example.split$partition == "development", ]
validation.data <- example.split[example.split$partition == "validation", ]
model.selection <- CompareRulesOnValidation(development.data=development.data,
               validation.data=validation.data,
               study.design.development="observational",
               vec.approaches=c("split.regression", "OWL.framework", "direct.interactions"),
               vec.rule.methods=c("glm.regression", "lasso"),
               vec.propensity.methods="logistic.regression",
               name.outcome.development="no_relapse",
               type.outcome.development="binary",
               name.treatment.development="intervention",
               names.influencing.treatment.development=c("prognosis", "clinic", "age"),
               names.influencing.rule.development=c("age", paste0("gene_", 1:10)),
               desirable.outcome.development=TRUE)
model.selection$list.summaries$split.regression

[Package DevTreatRules version 1.1.0 Index]