bart_est {causaldrf} | R Documentation |
The BART estimator
Description
This function estimates the ADRF using Bayesian additive regression trees (BART).
Usage
bart_est(Y,
treat,
outcome_formula,
data,
grid_val,
...)
Arguments
Y |
is the the name of the outcome variable contained in |
treat |
is the name of the treatment variable contained in
|
outcome_formula |
is the formula used for fitting the outcome surface.
gps is one of the independent variables to use in the outcome_formula. ie.
|
data |
is a dataframe containing |
grid_val |
contains the treatment values to be evaluated. |
... |
additional arguments to be passed to the bart() outcome function. |
Details
BART is a prediction model that is applicable to many settings, one of which is causal inference problems. It is a sum of trees fit, but the influence of each tree is held back by a regularization prior so that each tree only contributes a small amount to the overall fit. Priors are put on the parameters to avoid overfitting the data and so that no single tree has a significant influence on the model fit. For more details see Chipman (2010).
BART does not require fitting a treatment model. Instead, it fits a
response surface to the whole dataset and if the response surface is
correctly specified, then the causal effect estimate is unbiased.
Although most of the focus on BART is for the binary treatment setting,
Hill (2011) also mentions an extension to the continuous or
multidose treatment setting. When using BART in this continuous treatment
setting, Hill (2011) compares the outcomes of units with
treatment level T_i = t
to their outcomes had T_i = 0
.
This method infers the treatment effect of units had they not received
treatment compared to their actual observed treatment. The comparison
is between Y_i(0)| (I = 1, T_i = t)
and Y_i(t)| (I = 1, T_i = t)
where I = 1
means that the unit is part of the treatment group.
The causal effect is comparing the predicted outcome of units that
received treatment with what their predicted outcome would have been
had they received zero treatment.
This method performs well in simulation studies. One drawback from BART is the amount of computing time needed.
Value
bart_est
returns an object of class "causaldrf_simple",
a list that contains the following components:
param |
parameter estimates for a bart fit. |
out_mod |
the result of the bart fit. |
call |
the matched call. |
References
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
Hill, Jennifer L. (2011). Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 20.1 (2011).
Chipman, Hugh A and George, Edward I and McCulloch, Robert E and others (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics 4.1, 266–298.
See Also
nw_est
, iw_est
, hi_est
, gam_est
,
add_spl_est
, etc. for other estimates.
t_mod
, overlap_fun
to prepare the data
for use in the different estimates.
Examples
## Example from Schafer (2015). bart takes a few minutes to run (depending on computer).
example_data <- sim_data
# This estimate takes a long time to run...
bart_list <- bart_est(Y = Y,
treat = T,
outcome_formula = Y ~ T + B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8,
data = example_data,
grid_val = seq(8, 16, by = 1))
sample_index <- sample(1:1000, 100)
plot(example_data$T[sample_index],
example_data$Y[sample_index],
xlab = "T",
ylab = "Y",
main = "bart estimate")
lines(seq(8, 16, by = 1),
bart_list$param,
lty = 2,
lwd = 2,
col = "blue")
legend('bottomright',
"bart estimate",
lty=2,
lwd = 2,
col = "blue",
bty='Y',
cex=1)
rm(example_data, bart_list, sample_index)