cre {CRE}R Documentation

Causal rule ensemble


Performs the Causal Rule Ensemble on a data set with a response variable, a treatment variable, and various features.


cre(y, z, X, method_params = NULL, hyper_params = NULL, ite = NULL)



An observed response vector.


A treatment vector.


A covariate matrix (or a data frame).


The list of parameters to define the models used, including:

  • Parameters for Honest Splitting

    • ratio_dis: The ratio of data delegated to rules discovery (default: 0.5).

  • Parameters for Discovery

    • ite_method_dis: The method to estimate the discovery sample ITE (default: 'aipw').

    • ps_method_dis: The estimation model for the propensity score on the discovery subsample (default: 'SL.xgboost').

    • or_method_dis: The estimation model for the outcome regressions estimate_ite_aipw on the discovery subsample (default: 'SL.xgboost').

  • Parameters for Inference

    • ite_method_inf: The method to estimate the inference sample ITE (default: 'aipw').

    • ps_method_inf: The estimation model for the propensity score on the inference subsample (default: 'SL.xgboost').

    • or_method_inf: The estimation model for the outcome regressions in estimate_ite_aipw on the inference subsample (default: 'SL.xgboost').


The list of hyper parameters to finetune the method, including:

  • intervention_vars: Intervention-able variables used for Rules Generation (default: NULL).

  • offset: Name of the covariate to use as offset (i.e. 'x1') for T-Poisson ITE Estimation. NULL if offset is not used (default: NULL).

  • ntrees_rf: A number of decision trees for random forest (default: 20).

  • ntrees_gbm: A number of decision trees for the generalized boosted regression modeling algorithm. (default: 20).

  • node_size: Minimum size of the trees' terminal nodes (default: 20).

  • max_nodes: Maximum number of terminal nodes per tree (default: 5).

  • max_depth: Maximum rules length (default: 3).

  • replace: Boolean variable for replacement in bootstrapping for rules generation by random forest (default: TRUE).

  • t_decay: The decay threshold for rules pruning (default: 0.025).

  • t_ext: The threshold to define too generic or too specific (extreme) rules (default: 0.01, range: (0,0.5)).

  • t_corr: The threshold to define correlated rules (default: 1, range: (0,+inf)).

  • t_pvalue: the threshold to define statistically significant rules (default: 0.05, range: (0,1)).

  • stability_selection: Whether or not using stability selection for selecting the rules (default: TRUE).

  • cutoff: Threshold (percentage) defining the minimum cutoff value for the stability scores (default: 0.9).

  • pfer: Upper bound for the per-family error rate (tolerated amount of falsely selected rules) (default: 1).

  • penalty_rl: Order of penalty for rules length during LASSO regularization (i.e. 0: no penalty, 1: rules_length, 2: rules_length^2) (default: 1).


The estimated ITE vector. If given both the ITE estimation steps in Discovery and Inference are skipped (default: NULL).


An S3 object containing:


dataset <- generate_cre_dataset(n = 400, rho = 0, n_rules = 2, p = 10,
                                effect_size = 2, binary_covariates = TRUE,
                                binary_outcome = FALSE, confounding = "no")
y <- dataset[["y"]]
z <- dataset[["z"]]
X <- dataset[["X"]]

method_params <- list(ratio_dis = 0.25,
                      ps_method_dis = "SL.xgboost",
                      oreg_method_dis = "SL.xgboost",
                      ite_method_inf = "aipw",
                      ps_method_inf = "SL.xgboost",
                      oreg_method_inf = "SL.xgboost")

hyper_params <- list(intervention_vars = NULL,
                     offset = NULL,
                     ntrees_rf = 20,
                     ntrees_gbm = 20,
                     node_size = 20,
                     max_nodes = 5,
                     max_depth = 3,
                     t_decay = 0.025,
                     t_ext = 0.025,
                     t_corr = 1,
                     t_pvalue = 0.05,
                     replace = FALSE,
                     stability_selection = TRUE,
                     cutoff = 0.6,
                     pfer = 0.1,
                     penalty_rl = 1)

cre_results <- cre(y, z, X, method_params, hyper_params)

[Package CRE version 0.2.0 Index]