bal {mvGPS} | R Documentation |
Construct Covariate Balance Statistics for Models with Multivariate Exposure
Description
Assessing balance between exposure(s) and confounders is key when performing causal analysis using propensity scores. We provide a list of several models to generate weights to use in causal inference for multivariate exposures, and test the balancing property of these weights using weighted Pearson correlations. In addition, returns the effective sample size.
Usage
bal(
model_list,
D,
C,
common = FALSE,
trim_w = FALSE,
trim_quantile = 0.99,
all_uni = TRUE,
...
)
Arguments
model_list |
character string identifying which methods to use when constructing weights. See details for a list of available models |
D |
numeric matrix of dimension |
C |
either a list of numeric matrices of length |
common |
logical indicator for whether C is a single matrix of common
confounders for all exposures. default is FALSE meaning C must be specified
as list of confounders of length |
trim_w |
logical indicator for whether to trim weights. default is FALSE |
trim_quantile |
numeric scalar used to specify the upper quantile to trim weights if applicable. default is 0.99 |
all_uni |
logical indicator. If TRUE then all univariate models specified in model_list will be estimated for each exposure. If FALSE will only estimate weights for the first exposure |
... |
additional arguments to pass to |
Details
When using propensity score methods for causal inference it is crucial to check the balancing property of the covariates and exposure(s). To do this in the multivariate case we first use a weight generating method from the available list shown below.
Methods Available
"mvGPS": Multivariate generalized propensity score using Gaussian densities
"entropy": Estimating weights using entropy loss function without specifying propensity score (Tübbicke 2020)
"CBPS": Covariate balancing propensity score for continuous treatments which adds balance penalty while solving for propensity score parameters (Fong et al. 2018)
"PS": Generalized propensity score estimated using univariate Gaussian densities
"GBM": Gradient boosting to estimate the mean function of the propensity score, but still maintains Gaussian distributional assumptions (Zhu et al. 2015)
Note that only the mvGPS
method is multivariate and all others are strictly univariate.
For univariate methods weights are estimated for each exposure separately
using the weightit
function given the
confounders for that exposure in C
when all_uni=TRUE
. To estimate
weights for only the first exposure set all_uni=FALSE
.
It is also important to note that the weights for each method can be trimmed at
the desired quantile by setting trim_w=TRUE
and setting trim_quantile
in \[0.5, 1\]. Trimming is done at both the upper and lower bounds. For further details
see mvGPS
on how trimming is performed.
Balance Metrics
In this package we include three key balancing metrics to summarize balance across all of the exposures.
Euclidean distance
Maximum absolute correlation
Average absolute correlation
Euclidean distance is calculated using the origin point as reference, e.g. for m=2
exposures the reference point is \[0, 0\]. In this way we are calculating how far
the observed set of correlation points are from perfect balance.
Maximum absolute correlation reports the largest single imbalance between the exposures and the set of confounders. It is often a key diagnostic as even a single confounder that is sufficiently out of balance can reduce performance.
Average absolute correlation is the sum of the exposure-confounder correlations. This metric summarizes how well, on average, the entire set of exposures is balanced.
Effective Sample Size
Effective sample size, ESS, is defined as
ESS=(\Sigma_i w_i)^{2}/\Sigma_i w_i^2,
where w_i
are the estimated weights for a particular method (Kish 1965).
Note that when w=1
for all units that the ESS
is equal to the sample size n
.
ESS
decreases when there are extreme weights or high variability in the weights.
Value
-
W
: list of weights generated for each model -
cor_list
: list of weighted Pearson correlation coefficients for all confounders specified -
bal_metrics
: data.frame with the Euclidean distance, maximum absolute correlation, and average absolute correlation by method -
ess
: effective sample size for each of the methods used to generate weights -
models
: vector of models used
References
Fong C, Hazlett C, Imai K (2018).
“Covariate balancing propensity score for a continuous treatment: application to the efficacy of political advertisements.”
Annals of Applied Statistics, In-Press.
Kish L (1965).
Survey Sampling.
John Wiley \& Sons, New York.
Tübbicke S (2020).
“Entropy Balancing for Continuous Treatments.”
arXiv e-prints.
2001.06281.
Zhu Y, Coffman DL, Ghosh D (2015).
“A boosting algorithm for estimating generalized propensity scores with continuous treatments.”
Journal of Causal Inference, 3(1), 25-40.
Examples
#simulating data
sim_dt <- gen_D(method="u", n=150, rho_cond=0.2, s_d1_cond=2, s_d2_cond=2,
k=3, C_mu=rep(0, 3), C_cov=0.1, C_var=1, d1_beta=c(0.5, 1, 0),
d2_beta=c(0, 0.3, 0.75), seed=06112020)
D <- sim_dt$D
C <- sim_dt$C
#generating weights using mvGPS and potential univariate alternatives
require(WeightIt)
bal_sim <- bal(model_list=c("mvGPS", "entropy", "CBPS", "PS", "GBM"), D,
C=list(C[, 1:2], C[, 2:3]))
#overall summary statistics
bal_sim$bal_metrics
#effective sample sizes
bal_sim$ess
#we can also trim weights for all methods
bal_sim_trim <- bal(model_list=c("mvGPS", "entropy", "CBPS", "PS", "GBM"), D,
C=list(C[, 1:2], C[, 2:3]), trim_w=TRUE, trim_quantile=0.9, p.mean=0.5)
#note that in this case we can also pass additional arguments using in
#WeighIt package for entropy, CBPS, PS, and GBM such as specifying the p.mean
#can check to ensure all the weights have been properly trimmed at upper and
#lower bound
all.equal(unname(unlist(lapply(bal_sim$W, quantile, 0.99))),
unname(unlist(lapply(bal_sim_trim$W, max))))
all.equal(unname(unlist(lapply(bal_sim$W, quantile, 1-0.99))),
unname(unlist(lapply(bal_sim_trim$W, min))))