bart {bartcs}R Documentation

Fit BART models to select confounders and estimate treatment effect

Description

Fit Bayesian Regression Additive Trees (BART) models to select relevant confounders among a large set of potential confounders and to estimate average treatment effect E[Y(1) - Y(0)].

Usage

separate_bart(
  Y, trt, X,
  trt_treated     = 1,
  trt_control     = 0,
  num_tree        = 50,
  num_chain       = 4,
  num_burn_in     = 100,
  num_thin        = 1,
  num_post_sample = 100,
  step_prob       = c(0.28, 0.28, 0.44),
  alpha           = 0.95,
  beta            = 2,
  nu              = 3,
  q               = 0.95,
  dir_alpha       = 5,
  parallel        = FALSE,
  verbose         = TRUE
)

single_bart(
  Y, trt, X,
  trt_treated     = 1,
  trt_control     = 0,
  num_tree        = 50,
  num_chain       = 4,
  num_burn_in     = 100,
  num_thin        = 1,
  num_post_sample = 100,
  step_prob       = c(0.28, 0.28, 0.44),
  alpha           = 0.95,
  beta            = 2,
  nu              = 3,
  q               = 0.95,
  dir_alpha       = 5,
  parallel        = FALSE,
  verbose         = TRUE
)

Arguments

Y

A vector of outcome values.

trt

A vector of treatment values. Binary treatment works for both model and continuous treatment works for single_bart(). For binary treatment, use 1 to indicate the treated group and 0 for the control group.

X

A matrix of potential confounders.

trt_treated

Value of trt for the treated group. The default value is set to 1.

trt_control

Value of trt for the control group. The default value is set to 0.

num_tree

Number of trees in BART model. The default value is set to 100.

num_chain

Number of MCMC chains. Need to set num_chain > 1 for the Gelman-Rubin diagnostic. The default value is set to 4.

num_burn_in

Number of MCMC samples to be discarded per chain as initial burn-in periods. The default value is set to 100.

num_thin

Number of thinning per chain. One in every num_thin samples are selected. The default value is set to 1.

num_post_sample

Final number of posterior samples per chain. Number of MCMC iterations per chain is burn_in + num_thin * num_post_sample. The default value is set to 100.

step_prob

A vector of tree alteration probabilities (GROW, PRUNE, CHANGE). Each alteration is proposed to change the tree structure.
The default setting is ⁠(0.28, 0.28, 0.44)⁠.

alpha, beta

Hyperparameters for tree regularization prior. A terminal node of depth d will split with probability of alpha * (1 + d)^(-beta).
The default setting is ⁠(alpha, beta) = (0.95, 2)⁠ from Chipman et al. (2010).

nu, q

Values to calibrate hyperparameter of sigma prior.
The default setting is ⁠(nu, q) = (3, 0.95)⁠ from Chipman et al. (2010).

dir_alpha

Hyperparameter of Dirichlet prior for selection probabilities. The default value is 5.

parallel

If TRUE, model fitting will be parallelized with respect to N = nrow(X). Parallelization is recommended for very high n only. The default setting is FALSE.

verbose

If TRUE, message will be printed during training. If FALSE, message will be suppressed.

Details

separate_bart() and single_bart() fit an exposure model and outcome model(s) for estimating treatment effect with adjustment of confounders in the presence of a large set of potential confounders (Kim et al. 2023).

The exposure model E[A|X] and the outcome model(s) E[Y|A,X] are linked together with a common Dirichlet prior that accrues posterior selection probabilities to the corresponding confounders (X) on the basis of association with both the exposure (A) and the outcome (Y).

There is a distinction between fitting separate outcome models for the treated and control groups and fitting a single outcome model for both groups.

All inferences are made with outcome model(s).

Value

A bartcs object. A list object contains the following components.

mcmc_list

A mcmc.list object from coda package. mcmc_list contains the following items.

var_prob

Aggregated posterior inclusion probability of each variable.

var_count

Number of selection of each variable in each MCMC iteration. Its dimension is num_post_sample * ncol(X).

chains

A list of results from each MCMC chain.

model

separate or single.

label

Column names of X.

params

Parameters used in the model.

References

Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266-298. doi:10.1214/09-AOAS285

Kim, C., Tec, M., & Zigler, C. M. (2023). Bayesian Nonparametric Adjustment of Confounding, Biometrics doi:10.1111/biom.13833

Examples

data(ihdp, package = "bartcs")
single_bart(
  Y               = ihdp$y_factual,
  trt             = ihdp$treatment,
  X               = ihdp[, 6:30],
  num_tree        = 10,
  num_chain       = 2,
  num_post_sample = 20,
  num_burn_in     = 10,
  verbose         = FALSE
)
separate_bart(
  Y               = ihdp$y_factual,
  trt             = ihdp$treatment,
  X               = ihdp[, 6:30],
  num_tree        = 10,
  num_chain       = 2,
  num_post_sample = 20,
  num_burn_in     = 10,
  verbose         = FALSE
)

[Package bartcs version 1.2.2 Index]