R: Generalized BCEE algorithm

GBCEE {BCEE}

R Documentation

Generalized BCEE algorithm

Description

A generalized double robust Bayesian model averaging approach to causal effect estimation. This function accommodates both binary and continuous exposures and outcomes. More details are available in Talbot and Beaudoin (2020).

Usage

GBCEE(X, Y, U, omega, niter = 5000, family.X = "gaussian",
 family.Y = "gaussian", X1 = 1, X0 = 0, priorX = NA, priorY = NA, maxsize = NA,
 OR = 20, truncation = c(0.01, 0.99), var.comp = "asymptotic", B = 200, nsampX = 30)

Arguments

`X`	A vector of observed values for the exposure.
`Y`	A vector of observed values for the outcome.
`U`	A matrix of observed values for the `M` potential confounding covariates, where each column contains observed values for a potential confounding factor. A recommended implementation is to only consider pre-exposure covariates.
`omega`	The value of the hyperparameter omega in the BCEE's outcome model prior distribution. A recommended implementation is to take `omega` `=` `sqrt(n)*c`, where `n` is the sample size and `c` is a user-supplied constant value. Simulation studies suggest that values of `c` between 100 and 1000 yield good results.
`niter`	The number of iterations in the Markov chain Monte Carlo model composition (MC^3) algorithm (Madigan et al. 1995). The default is 5000, but larger values are recommended when the number of potential confounding covariates is large.
`family.X`	Distribution to be used for the exposure model. This should be `"gaussian"` if the exposure is continuous or `"binomial"` if the exposure is binary. The default is `"gaussian"`.
`family.Y`	Distribution to be used for the outcome model. This should be `"gaussian"` if the outcome is continuous or `"binomial"` if the outcome is binary. The default is `"gaussian"`.
`X1`	The value of `X1` for contrasts comparing `E[Y^{X1}]` to `E[Y^{X0}]`.
`X0`	The value of `X0` for contrasts comparing `E[Y^{X1}]` to `E[Y^{X0}]`.
`priorX`	A vector of length `M` for the prior probability of inclusion of the potential confounding covariates in the exposure model (`P(\alpha^X)`). The default is 0.5 for all covariates.
`priorY`	A vector of length `M` for the prior probability of inclusion of the potential confounding covariates in the outcome model. This vector multiplies BCEE's informative prior distribution (`P(\alpha^Y)`). The default is 0.5 for all covariates.
`maxsize`	The maximum number of covariates that can be included in a given exposure or outcome model. The default is `M`, which does not constrain the models' size.
`OR`	A number specifying the maximum ratio for excluding models in Occam's window for the outcome modeling step. All outcome models whose posterior probability is more than `OR` times smaller than the largest posterior probability are excluded from the model averaging. The posterior mass of discarded models is redistributed on the remaining models. See Madigan & Raftery, 1994. The default is 20.
`truncation`	A vector of length 2 indicating the smallest and largest values for the estimated propensity score (`P(X = 1\|U)`). Values outside those bounds are truncated to the bounds. Some truncation can help reduce the impact of near positivity violations. The default is `c(0.01, 0.99)`. Currently, no truncation is performed when `family.X = "gaussian"` and `family.Y = "gaussian"`.
`var.comp`	The method for computing the variance of the targeted maximum likelihood estimators in the BCEE algorithm. The possible values are `"asymptotic"`, for the efficient influence function based estimator, and `"boostrap"` for the nonparametric bootstrap estimator. The default is `"asymptotic"`.
`B`	The number of bootstrap samples when estimating the variance using the nonparametric bootstrap. The default is 200.
`nsampX`	The number of samples to take from the exposure distribution for the Monte Carlo integration when X is continuous and Y is binary. The default is 30.

Details

When both Y and X are continuous, GBCEE estimates \Delta = E[Y^{x+1}] - E[Y^x], assuming a linear effect of X on Y. When Y is continuous and X is binary, GBCEE estimates \Delta = E[Y^{X1}] - E[Y^{X0}]. When Y and X are binary, GBCEE estimates both \Delta = E[Y^{X1}] - E[Y^{X0}] and \Delta = E[Y^{X1}]/E[Y^{X0}]. When Y is binary and X is continuous, GBCEE estimates the slope of the logistic marginal structural working model logit(E[Y^{x}]) = \beta_0 + \beta_1 x

The GBCEE function first computes the exposure model's posterior distribution using a Markov chain Monte Carlo model composition (MC^3) algorithm (Madigan et al. 1995). The outcome model's posterior distribution is then computed using MC^3 (Madigan et al., 1995) as described in Section 3.4 of Talbot and Beaudoin (2022).

GBCEE assumes there are no missing values in the objects X, Y and U. The na.omit function which removes cases with missing data or an imputation package might be helpful.

Value

`beta`	The model averaged estimate of the causal effect (`\hat{\Delta}`). When `Y` is `"gaussian"`, this is `\Delta = E[Y^{X1}] - E[Y^{X0}]`. When both `Y` and `X` are `"binomial"`, `Diff` is `\Delta = E[Y^{X1}] - E[Y^{X0}]` and `RR` is `\Delta = E[Y^{X1}]/E[Y^{X0}]`. When `Y` is `"binomial"` and `X` is `"gaussian"`, `b0` and `b1` are the coefficients of the working marginal structural model `logit(E[Y^{x}]) = \beta_0 + \beta_1 x`.
`stderr`	The estimated standard error of the causal effect estimate.
`models.X`	A matrix giving the posterior distribution of the exposure model. Each row corresponds to an exposure model. Within each row, the first `M` elements are Booleans indicating the inclusion (1) or the exclusion (0) of each potential confounding factor. The last element gives the posterior probability of the exposure model.
`models.Y`	A matrix giving the posterior distribution of the outcome model after applying the Occam's window. Each row corresponds to an outcome model. Within each row, the first `M` elements are Booleans indicating the inclusion (1) or the exclusion (0) of each potential confounding factor. The next elements are the corresponding causal effect estimate(s) and standard error(s). The last element gives the posterior probability of the outcome model.

Author(s)

Denis Talbot

References

Madigan, D., York, J., Allard, D. (1995) Bayesian graphical models for discrete data, International Statistical Review, 63, 215-232.

Madigan, D., Raftery, A. E. (1994) Model selection and accounting for model uncertainty in graphical models using Occam's window, Journal of the American Statistical Association, 89 (428), 1535-1546.

Talbot, D., Beaudoin, C (2022) A generalized double robust Bayesian model averaging approach to causal effect estimation with application to the Study of Osteoporotic Fractures, Journal of Causal Inference, 10(1), 335-371.

Examples

#Example:
#In this example, both U1 and U2 are potential confounding covariates.
#Both are generated as independent N(0,1).
#X is generated as a function of both U1 and U2 with a N(0,1) error.
#Y is generated as a function of X and U1 with a N(0,1) error.
#Thus, only U1 is a confounder.
#Since both X and Y are continuous, the causal contrast estimated
#by GBCEE is E[Y^{x+1}] - E[Y^{x}] assuming a linear trend.
#The true value of the causal effect is 1. 
#Unbiased estimation is possible when adjusting for U1 or
#adjusting for both U1 and U2.


#Generating the data
set.seed(418949); 
U1 = rnorm(200); 
U2 = rnorm(200);
X = 0.5*U1 + 1*U2 + rnorm(200);
Y = 1*X + 0.5*U1 + rnorm(200);

#Using GBCEE to estimate the causal exposure effect
#Very few iterations are necessary since there are only 2 covariates
results = GBCEE(X,Y,cbind(U1,U2), omega = 500*sqrt(200), niter = 50,
                family.X = "gaussian", family.Y = "gaussian");

#Causal effect estimate
results$beta;

#Estimated standard error
results$stderr;

#Results from individual models
results$models.Y;

#Posterior probability of inclusion of each covariate in the outcome model
colSums(results$models.Y[,1:2]*results$models.Y[,ncol(results$models.Y)]);

[Package BCEE version 1.3.2 Index]