CausalForestDynamicSubgroups {EpiForsk} | R Documentation |
Calculate CATE in dynamically determined subgroups
Description
Determines subgroups ranked by CATE estimates from a causal_forest object, then calculates comparable CATE estimates in each subgroup and tests for differences.
Usage
CausalForestDynamicSubgroups(forest, n_rankings = 3, n_folds = 5, ...)
Arguments
forest |
An object of class |
n_rankings |
Integer, scalar with number of groups to rank CATE's into. |
n_folds |
Integer, scalar with number of folds to split data into. |
... |
Additional arguments passed to causal_forest() and regression_forest(). |
Details
To evaluate heterogeneity in treatment effect one can split data into groups by estimated CATE (for an alternative, see also RATEOmnibusTest). To compare estimates one must use a model which is not trained on the subjects we wish to compare. To achieve this, data is partitioned into n_folds folds and a causal forest is trained for each fold where the fold is left out. If the data has no existing clustering, one causal_forest() is trained with the folds as clustering structure. This enables predictions on each fold where trees using data from the fold are left out for the prediction. In the case of preexisting clustering in the data, folds are sampled within each cluster and combined across clusters afterwards.
Value
A list with elements
forest_subgroups: A tibble with CATE estimates, ranking, and AIPW-scores for each subject.
forest_rank_ate: A tibble with the ATE estimate and standard error of each subgroup.
forest_rank_diff_test: A tibble with estimates of the difference in ATE between subgroups and p-values for a formal test of no difference.
heatmap_data: A tibble with data used to draw a heatmap of covariate distribution in each subgroup.
forest_rank_ate_plot: ggplot with the ATE estimates in each subgroup.
heatmap: ggplot with heatmap of covariate distribution in each subgroup.
Author(s)
KIJA
Examples
n <- 800
p <- 3
X <- matrix(rnorm(n * p), n, p) |> as.data.frame()
W <- rbinom(n, 1, 0.5)
event_prob <- 1 / (1 + exp(2 * (pmax(2 * X[, 1], 0) * W - X[, 2])))
Y <- rbinom(n, 1, event_prob)
cf <- grf::causal_forest(X, Y, W)
cf_ds <- CausalForestDynamicSubgroups(cf, 2, 4)