dist_bal_match {MultiObjMatch} | R Documentation |
Optimal tradeoffs among distance, exclusion and marginal imbalance
Description
Explores tradeoffs among three important objective functions in an optimal matching problem:the sum of covariate distances within matched pairs, the number of treated units included in the match, and the marginal imbalance on pre-specified covariates (in total variation distance).
Usage
dist_bal_match(
data,
treat_col,
marg_bal_col,
exclusion_penalty = c(),
balance_penalty = c(),
dist_matrix = NULL,
dist_col = NULL,
exact_col = NULL,
propensity_col = NULL,
pscore_name = NULL,
ignore_col = NULL,
max_unmatched = 0.25,
caliper_option = NULL,
tol = 0.01,
max_iter = 1,
rho_max_factor = 10,
max_pareto_search_iter = 5
)
Arguments
data |
data frame that contain columns indicating treatment, outcome and covariates. |
treat_col |
character of name of the column indicating treatment assignment. |
marg_bal_col |
character of column name of the variable on which to evaluate marginal balance. |
exclusion_penalty |
(optional) numeric vector of values of exclusion penalty. Default is c(), which would trigger the auto grid search. |
balance_penalty |
(optional) factor of values of marginal balance penalty. Default value is c(), which would trigger the auto grid search. |
dist_matrix |
(optional) a matrix that specifies the pair-wise distances between any two objects. |
dist_col |
(optional) character vector of variable names used for calculating within-pair distance. |
exact_col |
(optional) character vector, variable names that we want exact matching on; NULL by default. |
propensity_col |
(optional) character vector, variable names on which to fit a propensity score (to supply a caliper). |
pscore_name |
(optional) character, giving the variable name for the fitted propensity score. |
ignore_col |
(optional) character vector of variable names that should be ignored when constructing the internal matching. NULL by default. |
max_unmatched |
(optional) numeric, the maximum proportion of unmatched units that can be accepted; default is 0.25. |
caliper_option |
(optional) numeric, the propensity score caliper value in standard deviations of the estimated propensity scores; default is NULL, which is no caliper. |
tol |
(optional) numeric, tolerance of close match distance; default is 1e-2. |
max_iter |
(optional) integer, maximum number of iterations to use in searching for penalty combintions that improve the matching; default is 1, where the algorithm searches for one round. |
rho_max_factor |
(optional) numeric, the scaling factor used in proposal for penalties; default is 10. |
max_pareto_search_iter |
(optional) numeric, the number of tries to search for the tol that yield pareto optimal solutions; default is 5. |
Details
Matched designs generated by this function are Pareto optimal for
the three objective functions. The degree of relative emphasis among the
three objectives in any specific solution is controlled by the penalties,
denoted by Greek letter rho. Larger values of exclusion_penalty
corresponds to
increased emphasis on retaining treated units (all else being equal), while
larger values of balance_penalty
corresponds to increased emphasis
on marginal
balance. Additional details:
Users may either specify their own distance matrix via the
dist_matrix
argument or ask the function to create a robust Mahalanobis distance matrix internally on a set of covariates specified by thedist_col
argument; if neither argument is specified an error will result. User-specified distance matrices should have row count equal to the number of treated units and column count equal to the number of controls.If the
caliper_option
argument is specified, a propensity score caliper will be imposed, forbidding matches between units more than a fixed distance apart on the propensity score. The caliper will be based either on a user-fit propensity score, identified in the input dataframe by argumentpscore_name
, or by an internally-fit propensity score based on logistic regression against the variables named inpropensity_col
. Ifcaliper_option
is non-NULL and neither of the other arguments is specified an error will result.-
tol
controls the precision at which the objective functions is evaluated. When matching problems are especially large or complex it may be necessary to increase toleranceOption in order to prevent integer overflows in the underlying network flow solver; generally this will be suggested in appropariate warning messages. While by default tradeoffs are only assessed at penalty combinations provided by the user, the user may ask for the algorithm to search over additional penalty values in order to identify additional Pareto optimal solutions.
rho_max_factor
is a multiplier applied to initial penalties to discover new solutions, and setting it larger leads to wider exploration; similarly,max_iter
controls how long the exploration routine runs, with larger values leading to more exploration.
Value
a named list whose elements are: * "rhoList": list of penalty combinations for each match * "matchList": list of matches indexed by number
"treatmentCol": character of treatment variable
"covs": character vector of names of the variables used for calculating within-pair distance
"exactCovs": character vector of names of variables that we want exact or close match on * "idMapping": numeric vector of row indices for each observation in the sorted data frame for internal use
"stats": data frame of important statistics (total variation distance) for variable on which marginal balance is measured
"b.var": character, name of variable on which marginal balance is measured * "dataTable": data frame sorted by treatment value
"t": a treatment vector
"df": the original dataframe input by the user
"pair_cost1": list of pair-wise distance sum using the first distance measure
"pair_cost2": list of pair-wise distance sum using the second distance measure (left NULL since only one distance measure is used here).
"version": (for internal use) the version of the matching function called; "Basic" indicates the matching comes from dist_bal_match and "Advanced" from two_dist_match.
"fPair": a vector of values for the first objective function; it corresponds to the pair-wise distance sum according to the first distance measure.
"fExclude": a vector of values for the second objective function; it corresponds to the number of treated units being unmatched.
"fMarginal": a vector of values for the third objective function; it corresponds to the marginal balanced distance for the specified variable(s).
See Also
Other main matching function:
two_dist_match()
Examples
data("lalonde", package="cobalt")
ps_cols <- c("age", "educ", "married", "nodegree", "race")
treat_val <- "treat"
response_val <- "re78"
pair_dist_val <- c("age", "married", "educ", "nodegree", "race")
my_bal_val <- c("race")
r1s <- c(0.01,1,2,4,4.4,5.2,5.4,5.6,5.8,6)
r2s <- c(0.001)
match_result <- dist_bal_match(data=lalonde, treat_col= treat_val,
marg_bal_col = my_bal_val, exclusion_penalty=r1s, balance_penalty=r2s,
dist_col = pair_dist_val,
propensity_col = ps_cols, max_iter=0)