nmatch {designmatch} | R Documentation |
Optimal nonbipartite matching in randomized experiments and observational studies
Description
Function for optimal nonbipartite matching in randomized experiments and observational studies that directly balances the observed covariates. nmatch
allows the user to enforce different forms of covariate balance in the matched samples, such as moment balance (e.g., of means, variances, and correlations), distributional balance (e.g., fine balance, near-fine balance, strength-k balancing), and exact matching. Among others, nmatch
can be used in the design of randomized experiments for matching before randomization (Greevy et al. 2004, Zou and Zubizarreta 2016), and in observational studies for matching with doses and strengthening an instrumental variable (Baiocchi et al. 2010, Lu et al. 2011).
Usage
nmatch(dist_mat, subset_weight = NULL, total_pairs = NULL, mom = NULL,
exact = NULL, near_exact = NULL, fine = NULL, near_fine = NULL,
near = NULL, far = NULL, solver = NULL)
Arguments
dist_mat |
distance matrix: a matrix of positive distances between units. |
subset_weight |
subset matching weight: a scalar that regulates the trade-off between the total sum of distances between matched pairs and the total number of matched pairs. The larger |
total_pairs |
total number of matched pairs: a scalar specifying the number of matched pairs to be obtained. If |
mom |
moment balance parameters: a list with three arguments,
|
exact |
Exact matching parameters: a list with one argument,
where |
near_exact |
Near-exact matching parameters: a list with two arguments,
|
fine |
Fine balance parameters: a list with one argument,
where |
near_fine |
Near-fine balance parameters: a list with two arguments,
|
near |
Near matching parameters: a list with three arguments,
|
far |
Far matching parameters: a list with three arguments,
|
solver |
Optimization solver parameters: a list with four objects,
|
Value
A list containing the optimal solution, with the following objects:
obj_total |
value of the objective function at the optimum; |
obj_dist_mat |
value of the total sum of distances term of the objective function at the optimum; |
id_1 |
indexes of the matched units in group 1 at the optimum; |
id_2 |
indexes of the matched units in group 2 at the optimum; |
group_id |
matched pairs at the optimum; |
time |
time elapsed to find the optimal solution. |
Author(s)
Jose R. Zubizarreta <zubizarreta@hcp.med.harvard.edu>, Cinar Kilcioglu <ckilcioglu16@gsb.columbia.edu>.
References
Baiocchi, M., Small, D., Lorch, S. and Rosenbaum, P. R. (2010), "Building a Stronger Instrument in an Observational Study of Perinatal Care for Premature Infants," Journal of the American Statistical Association, 105, 1285-1296.
Greevy, R., Lu, B., Silber, J. H., and Rosenbaum, P. R. (2004), "Optimal Multivariate Matching Before Randomization," Biostatistics, 5, 263-275.
Lu, B., Greevy, R., Xu, X., and Beck C. (2011), "Optimal Nonbipartite Matching and its Statistical Applications," The American Statistician, 65, 21-30.
Rosenbaum, P. R. (2010), Design of Observational Studies, Springer.
Rosenbaum, P. R. (2012), "Optimal Matching of an Optimally Chosen Subset in Observa- tional studies," Journal of Computational and Graphical Statistics, 21, 57-71.
Yang. F., Zubizarreta, J. R., Small, D. S., Lorch, S. A., and Rosenbaum, P. R. (2014), "Dissonant Conclusions When Testing the Validity of an Instrumental Variable," The American Statistician, 68, 253-263.
Zou, J., and Zubizarreta, J. R. (2016), "Covariate Balanced Restricted Randomization: Optimal Designs, Exact Tests, and Asymptotic Results," working paper.
Zubizarreta, J. R., Reinke, C. E., Kelz, R. R., Silber, J. H., and Rosenbaum, P. R. (2011), "Matching for Several Sparse Nominal Variables in a Case-Control Study of Readmission Following Surgery," The American Statistician, 65, 229-238.
Zubizarreta, J. R. (2012), "Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery," Journal of the American Statistical Association, 107, 1360-1371.
Examples
## Uncomment the following example
## Load and attach data
#data(lalonde)
#attach(lalonde)
#################################
## Example: optimal subset matching
#################################
## Optimal subset matching pursues two competing goals at
## the same time: to minimize the total of distances while
## matching as many observations as possible. The trade-off
## between these two is regulated by the parameter subset_weight
## (see Rosenbaum 2012 and Zubizarreta et al. 2013 for a discussion).
## Here the balance requirements are mean and fine balance for
## different covariates. We require 50 pairs to be matched.
## Again, the solver used is HiGHS with the approximate option.
## Matrix of covariates
#X_mat = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
## Distance matrix
#dist_mat_covs = round(dist(X_mat, diag = TRUE, upper = TRUE), 1)
#dist_mat = as.matrix(dist_mat_covs)
## Subset matching weight
#subset_weight = 1
## Total pairs to be matched
#total_pairs = 50
## Moment balance: constrain differences in means to be at most .1 standard deviations apart
#mom_covs = cbind(age, education)
#mom_tols = apply(mom_covs, 2, sd)*.1
#mom = list(covs = mom_covs, tols = mom_tols)
## Solver options
#t_max = 60*5
#solver = "highs"
#approximate = 1
#solver = list(name = solver, t_max = t_max, approximate = approximate, round_cplex = 0,
#trace_cplex = 0)
## Match
#out = nmatch(dist_mat = dist_mat, subset_weight = subset_weight, total_pairs = total_pairs,
#mom = mom, solver = solver)
## Indices of the treated units and matched controls
#id_1 = out$id_1
#id_2 = out$id_2
## Assess mean balance
#a = apply(mom_covs[id_1, ], 2, mean)
#b = apply(mom_covs[id_2, ], 2, mean)
#tab = round(cbind(a, b, a-b, mom_tols), 2)
#colnames(tab) = c("Mean 1", "Mean 2", "Diffs", "Tols")
#tab
## Assess fine balance (note here we are getting an approximate solution)
#for (i in 1:ncol(fine_covs)) {
# print(finetab(fine_covs[, i], id_1, id_2))
#}