R: Optimal Matching with Refined Covariate Balance

rcbalance {rcbalance}

R Documentation

Optimal Matching with Refined Covariate Balance

Description

This function computes an optimal match with refined covariate balance.

Usage

rcbalance(distance.structure, near.exact = NULL, fb.list = NULL, 
treated.info = NULL, control.info = NULL, exclude.treated = FALSE, target.group = NULL,
 k = 1, penalty = 3, tol = 1e-5, solver = 'rlemon')

Arguments

`distance.structure`	a list of vectors that encodes information about covariate distances between treated and control units. The list is equal in length to the number of treated units. Each vector corresponds to a treated unit and is equal in length to the number of control units to which it can be matched. It is assumed that there are a total of `nc` control units in the problem and that they are numbered from 1 to `nc`. The names of each vector in the list give the index (in the vector `1:nc`) of the control units to which the treated unit in question can be matched, and the elements of each vector are the covariate distances between the treated unit and the corresponding control. Note that for a dense matching problem (in which each treated unit can be matched to any control), every vector in the list will have length nc and rownames 1 through nc. Alternatively, this same information can be passed as a `matrix` or `InfinitySparseMatrix` with rows corresponding to treated units and columns corresponding to controls. Entries given as `Inf` correspond to pairs that cannot be matched.
`near.exact`	an optional character vector specifying names of covariates for near-exact matching. This argument takes precedence over any refined covariate balance constraints, so the match will produce the best refined covariate balance subject to matching exactly on this variable wherever possible. If multiple covariates are named, near-exact matching will be done on their interaction.
`fb.list`	an optional list of character vectors specifying covariates to be used for refined balance. Each element of the list corresponds to a level of refined covariate balance, and the levels are assumed to be in decreasing order of priority. Each character vector should contain one or more names of categorical covariates on which the user would like to enforce near fine balance. If multiple covariates are specified, an interaction is created between the categories of the covariates and near fine balance is enforced on the interaction. IMPORTANT: covariates or interactions coming later in the list must be nested within covariates coming earlier in the list; if this is not the case the function will stop with an error. An easy way to ensure that this occurs is to include in each character vector all the variables named in earlier list elements. If the `fb.list` argument is specified, the `treated.info` and `control.info` arguments must also be specified.
`treated.info`	an optional data frame containing covariate information for the treated units in the problem. The row count of this data frame must be equal to the length of the `distance.structure` argument, and it is assumed that row `i` contains covariate information for the treated unit described by element `i` of `distance.structure`. In addition, the column count and column names must be identical to those of the `control.info` argument, and the column names must include all of the covariate names mentioned in the `near.exact` and `fb.list` arguments.
`control.info`	an optional data frame containing covariate information for the control units in the problem. The row count of this data frame must be no smaller than the maximum control index in the `distance.structure` argument, and it is assumed that row `i` contains the covariate information for the control indexed by `i` in distance.structure. In addition, the column count and column names must be identical to those of the `treated.info` argument.
`exclude.treated`	if `TRUE`, then when there is no feasible match using all treated units, a minimal number of treated units will be dropped so that a match can be formed. The excluded treated units will be selected optimally so that the cost of the matching is reduced as much as possible. NOTE: `exclude.treated` = `TRUE` is incompatible with arguments to `target.group` and with values of `k` larger than 1.
`target.group`	an optional data frame of observations with the desired covariate distribution for the selected control group, if it differs from the covariate distribution of the treated units. This argument will be ignored unless `fb.list`, `treated.info` and `control.info` are also specified, and it must have the same dimensions as `treated.info`.
`k`	a nonnegative integer. The number of control units to which each treated unit will be matched.
`penalty`	a value greater than 1. This is a tuning parameter that helps ensure the different levels of refined covariate balance are prioritized correctly. Setting the penalty higher tends to improve the guarantee of match optimality up to a point, but penalties above a certain level cause integer overflows and throw errors. Usually it is not recommended that the user change this parameter from its default value.
`tol`	edge cost tolerance. This is the smallest tolerated difference between matching costs; cost differences smaller than this will be considered zero. Match distances will be scaled by inverse tolerance, so when matching with large edge costs or penalties the tolerance may need to be increased.
`solver`	the name of the package used to solve the network flow optimization problem underlying the match, one of 'rlemon' (which uses the Lemon Optimization Library) and 'rrelaxiv' (which uses the RELAX-IV algorithm).

Details

To use the option solver = 'rrelaxiv', the user must install the rrelaxiv manually; it is not hosted on CRAN because it carries an academic license.

Value

A list with the following components:

`matches`	a nt by k matrix containing the matched sets produced by the algorithm (where nt is the number of treated units). The rownames of this matrix are the numbers of the treated units (indexed by their position in distance.structure), and the elements of each row contain the indices of the control units to which this treated unit has been matched.
`fb.tables`	a list of matrices, equal in length to the fb.list argument. Each matrix is a contingency table giving the counts among treated units and matched controls for each level of the categorical variable specified by the corresponding element of fb.list.

Author(s)

Samuel D. Pimentel

References

Pimentel, S.D., Kelz, R.R., Silber, J.H., and Rosenbaum, P.R. (2015) Large, sparse optimal matching with refined covariate balance in an observational study of the health outcomes produced by new surgeons, JASA 110 (510), 515-527.

Examples

## Not run: 
library(optmatch)	
data(nuclearplants)

#require exact match on variables ne and pt, use rank-based Mahalanobis distance
my.dist.struct <- build.dist.struct(z = nuclearplants$pr, 
	X = subset(nuclearplants[c('date','t1','t2','cap','bw','cum.n')]),
	exact = paste(nuclearplants$ne, nuclearplants$pt, sep = '.'))

#match with refined covariate balance, first on ct then on (ct x bw)
rcbalance(my.dist.struct, fb.list = list('ct',c('ct','bw')),
 	treated.info = nuclearplants[which(nuclearplants$pr ==1),],
 	control.info = nuclearplants[which(nuclearplants$pr == 0),])

#repeat the same match using match_on tool from optmatch and regular Mahalanobis distance
exact.mask <- exactMatch(pr ~ ne + pt, data = nuclearplants)
my.dist.matrix <- match_on(pr ~ date + t1 + t2 + cap + bw + cum.n,
	within = exact.mask, data = nuclearplants)
match.matrix <- 
	rcbalance(my.dist.matrix*100, fb.list = list('ct',c('ct','bw')), 
	treated.info = nuclearplants[which(nuclearplants$pr ==1),],
	control.info = nuclearplants[which(nuclearplants$pr == 0),])

## End(Not run)

[Package rcbalance version 1.8.8 Index]