bmatch {designmatch} | R Documentation |
Optimal bipartite matching in observational studies
Description
Function for optimal bipartite matching in observational studies that directly balances the observed covariates. bmatch
allows the user to enforce different forms of covariate balance in the matched samples such as moment balance (e.g., of means, variances and correlations), distributional balance (e.g., fine balance, near-fine balance, strength-k balancing) and exact matching. In observational studies where an instrumental variable is available, bmatch
can be used to handle weak instruments and strengthen them by means of the far
options (see Yang et al. 2014 for an example). bmatch
can also be used in discontinuity designs by matching units in a neighborhood of the discontinuity (see Keele et al. 2015 for details). In any of these settings, bmatch
either minimizes the total sum of covariate distances between matched units, maximizes the total number of matched units, or optimizes a combination of the two, subject to matching, covariate balancing, and representativeness constraints (see the examples below).
Usage
bmatch(t_ind, dist_mat = NULL, subset_weight = NULL, n_controls = 1,
total_groups = NULL, mom = NULL, ks = NULL, exact = NULL,
near_exact = NULL, fine = NULL, near_fine = NULL, near = NULL,
far = NULL, solver = NULL)
Arguments
t_ind |
treatment indicator: a vector of zeros and ones indicating treatment (1 = treated; 0 = control). Please note that the data needs to be sorted in decreasing order according to this treatment indicator. |
dist_mat |
distance matrix: a matrix of positive distances between treated units (rows) and controls (columns). If |
subset_weight |
subset matching weight: a scalar that regulates the trade-off between the total sum of distances between matched pairs and the total number of matched pairs. The larger If If |
n_controls |
number of controls: a scalar defining the number of controls to be matched with a fixed rate to each treated unit. The default is |
total_groups |
total number of matched pairs: a scalar specifying the number of matched pairs to be obtained. If |
mom |
moment balance parameters: a list with three arguments,
|
ks |
Kolmogorov-Smirnov balance parameters: a list with three objects,
|
exact |
Exact matching parameters: a list with one argument,
where |
near_exact |
Near-exact matching parameters: a list with two objects,
|
fine |
Fine balance parameters: a list with one argument,
where |
near_fine |
Near-fine balance parameters: a list with two objects,
|
near |
Near matching parameters: a list with three objects,
|
far |
Far matching parameters: a list with three objects,
|
solver |
Optimization solver parameters: a list with four objects,
|
Value
A list containing the optimal solution, with the following objects:
obj_total |
value of the objective function at the optimum; |
obj_dist_mat |
value of the total sum of distances term of the objective function at the optimum; |
t_id |
indexes of the matched treated units at the optimum; |
c_id |
indexes of the matched controls at the optimum; |
group_id |
matched pairs or groups at the optimum; |
time |
time elapsed to find the optimal solution. |
Author(s)
Jose R. Zubizarreta <zubizarreta@hcp.med.harvard.edu>, Cinar Kilcioglu <ckilcioglu16@gsb.columbia.edu>.
References
Keele, L., Titiunik, R., and Zubizarreta, J. R., (2015), "Enhancing a Geographic Regression Discontinuity Design Through Matching to Estimate the Effect of Ballot Initiatives on Voter Turnout," Journal of the Royal Statistical Society: Series A, 178, 223-239.
Rosenbaum, P. R. (2010), Design of Observational Studies, Springer.
Rosenbaum, P. R. (2012), "Optimal Matching of an Optimally Chosen Subset in Observa- tional studies," Journal of Computational and Graphical Statistics, 21, 57-71.
Yang, D., Small, D., Silber, J. H., and Rosenbaum, P. R. (2012), "Optimal Matching With Minimal Deviation From Fine Balance in a Study of Obesity and Surgical Outcomes," Biometrics, 68, 628-36.
Yang. F., Zubizarreta, J. R., Small, D. S., Lorch, S. A., and Rosenbaum, P. R. (2014), "Dissonant Conclusions When Testing the Validity of an Instrumental Variable," The American Statistician, 68, 253-263.
Zubizarreta, J. R., Reinke, C. E., Kelz, R. R., Silber, J. H., and Rosenbaum, P. R. (2011), "Matching for Several Sparse Nominal Variables in a Case-Control Study of Readmission Following Surgery," The American Statistician, 65, 229-238.
Zubizarreta, J. R. (2012), "Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery," Journal of the American Statistical Association, 107, 1360-1371.
Zubizarreta, J. R., Paredes, R. D., and Rosenbaum, P. R. (2014), "Matching for Balance, Pairing for Heterogeneity in an Observational Study of the Effectiveness of For-profit and Not-for-profit High Schools in Chile," Annals of Applied Statistics, 8, 204-231.
See Also
sensitivitymv, sensitivitymw.
Examples
## Uncomment the following examples
## Load, sort, and attach data
#data(lalonde)
#lalonde = lalonde[order(lalonde$treatment, decreasing = TRUE), ]
#attach(lalonde)
#################################
## Example 1: cardinality matching
#################################
## Cardinality matching finds the largest matched sample of pairs that meets balance
## requirements. Here the balance requirements are mean balance, fine balance and
## exact matching for different covariates. The solver used is glpk with the
## approximate option.
## Treatment indicator; note that the data needs to be sorted in decreasing order
## according to this treatment indicator
#t_ind = treatment
#t_ind
## Distance matrix
#dist_mat = NULL
## Subset matching weight
#subset_weight = 1
## Moment balance: constrain differences in means to be at most .05 standard deviations apart
#mom_covs = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
#mom_tols = round(absstddif(mom_covs, t_ind, .05), 2)
#mom = list(covs = mom_covs, tols = mom_tols)
## Fine balance
#fine_covs = cbind(black, hispanic, married, nodegree)
#fine = list(covs = fine_covs)
## Exact matching
#exact_covs = cbind(black)
#exact = list(covs = exact_covs)
## Solver options
#t_max = 60*5
#solver = "glpk"
#approximate = 1
#solver = list(name = solver, t_max = t_max, approximate = approximate,
#round_cplex = 0, trace = 0)
## Match
#out = bmatch(t_ind = t_ind, dist_mat = dist_mat, subset_weight = subset_weight,
#mom = mom, fine = fine, exact = exact, solver = solver)
## Indices of the treated units and matched controls
#t_id = out$t_id
#c_id = out$c_id
## Time
#out$time/60
## Matched group identifier (who is matched to whom)
#out$group_id
## Assess mean balance
#meantab(mom_covs, t_ind, t_id, c_id)
## Assess fine balance (note here we are getting an approximate solution)
#for (i in 1:ncol(fine_covs)) {
# print(finetab(fine_covs[, i], t_id, c_id))
#}
## Assess exact matching balance
#table(exact_covs[t_id]==exact_covs[c_id])
##################################
## Example 2: minimum distance matching
##################################
## The goal here is to minimize the total of distances between matched pairs. In
## this example there are no covariate balance requirements. Again, the solver
## used is glpk with the approximate option
## Treatment indicator
#t_ind = treatment
## Matrix of covariates
#X_mat = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
## Distance matrix
#dist_mat = distmat(t_ind, X_mat)
## Subset matching weight
#subset_weight = NULL
## Total pairs to be matched
#total_groups = sum(t_ind)
## Solver options
#t_max = 60*5
#solver = "glpk"
#approximate = 1
#solver = list(name = solver, t_max = t_max, approximate = approximate,
#round_cplex = 0, trace_cplex = 0)
## Match
#out = bmatch(t_ind = t_ind, dist_mat = dist_mat, total_groups = total_groups,
#solver = solver)
## Indices of the treated units and matched controls
#t_id = out$t_id
#c_id = out$c_id
## Total of distances between matched pairs
#out$obj_total
## Assess mean balance
#meantab(X_mat, t_ind, t_id, c_id)
##################################
## Example 3: optimal subset matching
##################################
## Optimal subset matching pursues two competing goals at
## the same time: to minimize the total sum of covariate distances
## while matching as many observations as possible. The trade-off
## between these two goals is regulated by the parameter subset_weight
## (see Rosenbaum 2012 and Zubizarreta et al. 2013 for a discussion).
## Here the balance requirements are mean balance, near-fine balance
## and near-exact matching for different covariates.
## Again, the solver used is glpk with the approximate option.
## Treatment indicator
#t_ind = treatment
## Matrix of covariates
#X_mat = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
## Distance matrix
#dist_mat = distmat(t_ind, X_mat)
## Subset matching weight
#subset_weight = median(dist_mat)
## Moment balance: constrain differences in means to be at most .05 standard deviations apart
#mom_covs = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
#mom_tols = round(absstddif(mom_covs, t_ind, .05), 2)
#mom = list(covs = mom_covs, tols = mom_tols)
## Near-fine balance
#near_fine_covs = cbind(married, nodegree)
#near_fine_devs = rep(5, 2)
#near_fine = list(covs = near_fine_covs, devs = near_fine_devs)
## Near-exact matching
#near_exact_covs = cbind(black, hispanic)
#near_exact_devs = rep(5, 2)
#near_exact = list(covs = near_exact_covs, devs = near_exact_devs)
## Solver options
#t_max = 60*5
#solver = "glpk"
#approximate = 1
#solver = list(name = solver, t_max = t_max, approximate = approximate,
#round_cplex = 0, trace_cplex = 0)
## Match
#out = bmatch(t_ind = t_ind, dist_mat = dist_mat, subset_weight = subset_weight,
#mom = mom, near_fine = near_fine, near_exact = near_exact, solver = solver)
## Indices of the treated units and matched controls
#t_id = out$t_id
#c_id = out$c_id
## Time
#out$time/60
## Matched group identifier (who is matched to whom)
#out$group_id
## Assess mean balance (note here we are getting an approximate solution)
#meantab(X_mat, t_ind, t_id, c_id)
## Assess fine balance
#for (i in 1:ncol(near_fine_covs)) {
# print(finetab(near_fine_covs[, i], t_id, c_id))
#}
## Assess exact matching balance
#for (i in 1:ncol(near_exact_covs)) {
# print(table(near_exact_covs[t_id, i]==near_exact_covs[c_id, i]))
#}