bart_est {causaldrf}R Documentation

The BART estimator

Description

This function estimates the ADRF using Bayesian additive regression trees (BART).

Usage

bart_est(Y,
         treat,
         outcome_formula,
         data,
         grid_val,
         ...)

Arguments

Y

is the the name of the outcome variable contained in data.

treat

is the name of the treatment variable contained in data.

outcome_formula

is the formula used for fitting the outcome surface. gps is one of the independent variables to use in the outcome_formula. ie. Y ~ treat + X.1 + X.2 + ... or a variation of this.

data

is a dataframe containing Y, treat, and X.

grid_val

contains the treatment values to be evaluated.

...

additional arguments to be passed to the bart() outcome function.

Details

BART is a prediction model that is applicable to many settings, one of which is causal inference problems. It is a sum of trees fit, but the influence of each tree is held back by a regularization prior so that each tree only contributes a small amount to the overall fit. Priors are put on the parameters to avoid overfitting the data and so that no single tree has a significant influence on the model fit. For more details see Chipman (2010).

BART does not require fitting a treatment model. Instead, it fits a response surface to the whole dataset and if the response surface is correctly specified, then the causal effect estimate is unbiased. Although most of the focus on BART is for the binary treatment setting, Hill (2011) also mentions an extension to the continuous or multidose treatment setting. When using BART in this continuous treatment setting, Hill (2011) compares the outcomes of units with treatment level T_i = t to their outcomes had T_i = 0. This method infers the treatment effect of units had they not received treatment compared to their actual observed treatment. The comparison is between Y_i(0)| (I = 1, T_i = t) and Y_i(t)| (I = 1, T_i = t) where I = 1 means that the unit is part of the treatment group. The causal effect is comparing the predicted outcome of units that received treatment with what their predicted outcome would have been had they received zero treatment.

This method performs well in simulation studies. One drawback from BART is the amount of computing time needed.

Value

bart_est returns an object of class "causaldrf_simple", a list that contains the following components:

param

parameter estimates for a bart fit.

out_mod

the result of the bart fit.

call

the matched call.

References

Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.

Hill, Jennifer L. (2011). Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics 20.1 (2011).

Chipman, Hugh A and George, Edward I and McCulloch, Robert E and others (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics 4.1, 266–298.

See Also

nw_est, iw_est, hi_est, gam_est, add_spl_est, etc. for other estimates.

t_mod, overlap_fun to prepare the data for use in the different estimates.

Examples

## Example from Schafer (2015).  bart takes a few minutes to run (depending on computer).

example_data <- sim_data


# This estimate takes a long time to run...
bart_list <- bart_est(Y = Y,
          treat = T,
          outcome_formula = Y ~ T + B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8,
          data = example_data,
          grid_val = seq(8, 16, by = 1))

sample_index <- sample(1:1000, 100)

plot(example_data$T[sample_index],
    example_data$Y[sample_index],
    xlab = "T",
    ylab = "Y",
    main = "bart estimate")

lines(seq(8, 16, by = 1),
      bart_list$param,
      lty = 2,
      lwd = 2,
      col = "blue")

legend('bottomright',
        "bart estimate",
        lty=2,
        lwd = 2,
        col = "blue",
        bty='Y',
        cex=1)


rm(example_data, bart_list, sample_index)

[Package causaldrf version 0.4.2 Index]