covariate_balance {quickmatch} | R Documentation |
Covariate balance in matched sample
Description
covariate_balance
derives measures of covariate balance between
treatment groups in matched samples. The function calculates normalized mean
differences between all pairs of treatment conditions for each covariate.
Usage
covariate_balance(
treatments,
covariates,
matching = NULL,
target = NULL,
normalize = TRUE,
all_differences = FALSE
)
Arguments
treatments |
factor specifying the units' treatment assignments. |
covariates |
vector, matrix or data frame with covariates to derive balance for. |
matching |
|
target |
units to target the balance measures for. If |
normalize |
logical scalar indicating whether differences should be normalized by the sample standard deviation of the corresponding covariates. |
all_differences |
logical scalar indicating whether full matrices of differences should be
reported. If |
Details
covariate_balance
calculates covariate balance by first deriving the
(normalized) mean difference between all treatment conditions for each
covariate in each matched group. It then aggregates the differences by a
weighted average, where the target
parameter decides the weights.
When the average treatment effect (ATE) is of interest (i.e.,
target == NULL
), the matched groups will be weighted by their sizes.
When target
indicates that some subset of units is of interest, the
number of such units in each matched group will decide its weight. For
example, if we are interested in the average treatment effect of the treated
(ATT), the weight of a group will be proportional to the number of treated
units in that group. The reweighting of the groups captures that we are
prepared to accept greater imbalances in groups with few units of interest.
By default, the differences are normalized by the sample standard deviation
of the corresponding covariate (see the normalize
parameter). In more
detail, the sample variance of the covariate is derived separately for each
treatment group. The square root of the mean of these variances is then used
for the normalization. The matching is ignored when deriving the normalization
factor so that balance can be compared across different matchings or with
the unmatched sample.
covariate_balance
focuses on mean differences, but higher moments and
interactions can be investigated by adding corresponding columns to the
covariate matrix (see examples below).
Value
Returns the mean difference between treatment groups in the matched sample for each covariate.
When all_differences = TRUE
, the function returns a matrix for each
covariate with the mean difference for each possible pair of treatment
conditions. Rows in the matrices indicate minuends in the differences and
columns indicate subtrahends. For example, when differences are normalized,
the matrix:
a | b | c | |
a | 0.0 | 0.3 | 0.5 |
b | -0.3 | 0.0 | 0.2 |
c | -0.5 | -0.2 | 0.0 |
reports that the mean difference for the corresponding covariate between treatments "a" and "b" is 30% of a sample standard deviation of the covariate. The maximum difference (in absolute value) is also reported in a separate vector. For example, the maximum difference for the covariate in the example above is 0.5.
When all_differences = FALSE
, only the maximum differences are
reported.
Examples
# Construct example data
my_data <- data.frame(y = rnorm(100),
x1 = runif(100),
x2 = runif(100),
treatment = factor(sample(rep(c("T1", "T2", "C"), c(25, 25, 50)))))
# Make distances
my_distances <- distances(my_data, dist_variables = c("x1", "x2"))
# Balance in unmatched sample (maximum for each covariate)
covariate_balance(my_data$treatment, my_data[c("x1", "x2")])
# Make matching
my_matching <- quickmatch(my_distances, my_data$treatment)
# Balance in matched sample (maximum for each covariate)
covariate_balance(my_data$treatment, my_data[c("x1", "x2")], my_matching)
# Balance in matched sample for ATT
covariate_balance(my_data$treatment,
my_data[c("x1", "x2")],
my_matching,
target = c("T1", "T2"))
# Balance on second-order moments and interactions
balance_cov <- data.frame(x1 = my_data$x1,
x2 = my_data$x2,
x1sq = my_data$x1^2,
x2sq = my_data$x2^2,
x1x2 = my_data$x1 * my_data$x2)
covariate_balance(my_data$treatment, balance_cov, my_matching)
# Report all differences (not only maximum for each covariate)
covariate_balance(my_data$treatment,
my_data[c("x1", "x2")],
my_matching,
all_differences = TRUE)