matchit {MatchIt} | R Documentation |
Matching for Causal Inference
Description
matchit()
is the main function of MatchIt and performs
pairing, subset selection, and subclassification with the aim of creating
treatment and control groups balanced on included covariates. MatchIt
implements the suggestions of Ho, Imai, King, and Stuart (2007) for
improving parametric statistical models by preprocessing data with
nonparametric matching methods.
This page documents the overall use of matchit()
, but for specifics
of how matchit()
works with individual matching methods, see the
individual pages linked in the Details section below.
Usage
matchit(
formula,
data = NULL,
method = "nearest",
distance = "glm",
link = "logit",
distance.options = list(),
estimand = "ATT",
exact = NULL,
mahvars = NULL,
antiexact = NULL,
discard = "none",
reestimate = FALSE,
s.weights = NULL,
replace = FALSE,
m.order = NULL,
caliper = NULL,
std.caliper = TRUE,
ratio = 1,
verbose = FALSE,
include.obj = FALSE,
...
)
## S3 method for class 'matchit'
print(x, ...)
Arguments
formula |
a two-sided |
data |
a data frame containing the variables named in |
method |
the matching method to be used. The allowed methods are
|
distance |
the distance measure to be used. Can be either the name of a
method of estimating propensity scores (e.g., |
link |
when |
distance.options |
a named list containing additional arguments
supplied to the function that estimates the distance measure as determined
by the argument to |
estimand |
a string containing the name of the target estimand desired.
Can be one of |
exact |
for methods that allow it, for which variables exact matching
should take place. Can be specified as a string containing the names of
variables in |
mahvars |
for methods that allow it, on which variables Mahalanobis
distance matching should take place when |
antiexact |
for methods that allow it, for which variables anti-exact
matching should take place. Anti-exact matching ensures paired individuals
do not have the same value of the anti-exact matching variable(s). Can be
specified as a string containing the names of variables in |
discard |
a string containing a method for discarding units outside a
region of common support. When a propensity score is estimated or supplied
to |
reestimate |
if |
s.weights |
an optional numeric vector of sampling weights to be
incorporated into propensity score models and balance statistics. Can also
be specified as a string containing the name of variable in |
replace |
for methods that allow it, whether matching should be done
with replacement ( |
m.order |
for methods that allow it, the order that the matching takes
place. Allowable options depend on the matching method. The default of
|
caliper |
for methods that allow it, the width(s) of the caliper(s) to
use in matching. Should be a numeric vector with each value named according
to the variable to which the caliper applies. To apply to the distance
measure, the value should be unnamed. See the individual methods pages for
information on whether and how this argument is used. The default is
|
std.caliper |
|
ratio |
for methods that allow it, how many control units should be matched to each treated unit in k:1 matching. Should be a single integer value. See the individual methods pages for information on whether and how this argument is used. The default is 1 for 1:1 matching. |
verbose |
|
include.obj |
|
... |
additional arguments passed to the functions used in the
matching process. See the individual methods pages for information on what
additional arguments are allowed for each method. Ignored for |
x |
a |
Details
Details for the various matching methods can be found at the following help pages:
-
method_nearest
for nearest neighbor matching -
method_optimal
for optimal pair matching -
method_full
for optimal full matching -
method_genetic
for genetic matching -
method_cem
for coarsened exact matching -
method_exact
for exact matching -
method_cardinality
for cardinality and template matching -
method_subclass
for subclassification
The pages contain information on what the method does, which of the arguments above are
allowed with them and how they are interpreted, and what additional
arguments can be supplied to further tune the method. Note that the default
method with no arguments supplied other than formula
and data
is 1:1 nearest neighbor matching without replacement on a propensity score
estimated using a logistic regression of the treatment on the covariates.
This is not the same default offered by other matching programs, such as
those in Matching, teffects
in Stata, or PROC PSMATCH
in SAS, so care should be taken if trying to replicate the results of those
programs.
When method = NULL
, no matching will occur, but any propensity score
estimation and common support restriction will. This can be a simple way to
estimate the propensity score for use in future matching specifications
without having to re-estimate it each time. The matchit()
output with
no matching can be supplied to summary()
to examine balance prior to
matching on any of the included covariates and on the propensity score if
specified. All arguments other than distance
, discard
, and
reestimate
will be ignored.
See distance for details on the several ways to
specify the distance
, link
, and distance.options
arguments to estimate propensity scores and create distance measures.
When the treatment variable is not a 0/1
variable, it will be coerced
to one and returned as such in the matchit()
output (see section
Value, below). The following rules are used: 1) if 0
is one of the
values, it will be considered the control and the other value the treated;
2) otherwise, if the variable is a factor, levels(treat)[1]
will be
considered control and the other variable the treated; 3) otherwise,
sort(unique(treat))[1]
will be considered control and the other value
the treated. It is safest to ensure the treatment variable is a 0/1
variable.
The discard
option implements a common support restriction. It can
only be used when a distance measure is an estimated propensity score or supplied as a vector and is ignored for some matching
methods. When specified as "treated"
, treated units whose distance
measure is outside the range of distance measures of the control units will
be discarded. When specified as "control"
, control units whose
distance measure is outside the range of distance measures of the treated
units will be discarded. When specified as "both"
, treated and
control units whose distance measure is outside the intersection of the
range of distance measures of the treated units and the range of distance
measures of the control units will be discarded. When reestimate = TRUE
and distance
corresponds to a propensity score-estimating
function, the propensity scores are re-estimated in the remaining units
prior to being used for matching or calipers.
Caution should be used when interpreting effects estimated with various
values of estimand
. Setting estimand = "ATT"
doesn't
necessarily mean the average treatment effect in the treated is being
estimated; it just means that for matching methods, treated units will be
untouched and given weights of 1 and control units will be matched to them
(and the opposite for estimand = "ATC"
). If a caliper is supplied or
treated units are removed for common support or some other reason (e.g.,
lacking matches when using exact matching), the actual estimand targeted is
not the ATT but the treatment effect in the matched sample. The argument to
estimand
simply triggers which units are matched to which, and for
stratification-based methods (exact matching, CEM, full matching, and
subclassification), determines the formula used to compute the
stratification weights.
How Matching Weights Are Computed
Matching weights are computed in one of two ways depending on whether matching was done with replacement or not.
For matching without replacement (except for cardinality matching), each
unit is assigned to a subclass, which represents the pair they are a part of
(in the case of k:1 matching) or the stratum they belong to (in the case of
exact matching, coarsened exact matching, full matching, or
subclassification). The formula for computing the weights depends on the
argument supplied to estimand
. A new "stratum propensity score"
(sp
) is computed as the proportion of units in each stratum that are
in the treated group, and all units in that stratum are assigned that
stratum propensity score. This is distinct from the propensity score used for matching, if any. Weights are then computed using the standard formulas for
inverse probability weights with the stratum propensity score inserted: for the ATT, weights are 1 for the treated
units and sp/(1-sp)
for the control units; for the ATC, weights are
(1-sp)/sp
for the treated units and 1 for the control units; for the
ATE, weights are 1/sp
for the treated units and 1/(1-sp)
for the
control units. For cardinality matching, all matched units receive a weight
of 1.
For matching with replacement, units are not assigned to unique strata. For
the ATT, each treated unit gets a weight of 1. Each control unit is weighted
as the sum of the inverse of the number of control units matched to the same
treated unit across its matches. For example, if a control unit was matched
to a treated unit that had two other control units matched to it, and that
same control was matched to a treated unit that had one other control unit
matched to it, the control unit in question would get a weight of 1/3 + 1/2
= 5/6. For the ATC, the same is true with the treated and control labels
switched. The weights are computed using the match.matrix
component
of the matchit()
output object.
In each treatment group, weights are divided by the mean of the nonzero
weights in that treatment group to make the weights sum to the number of
units in that treatment group. If sampling weights are included through the
s.weights
argument, they will be included in the matchit()
output object but not incorporated into the matching weights.
match.data()
, which extracts the matched set from a matchit
object,
combines the matching weights and sampling weights.
Value
When method
is something other than "subclass"
, a
matchit
object with the following components:
match.matrix |
a matrix containing the matches. The rownames correspond
to the treated units and the values in each row are the names (or indices)
of the control units matched to each treated unit. When treated units are
matched to different numbers of control units (e.g., with exact matching or
matching with a caliper), empty spaces will be filled with |
subclass |
a factor
containing matching pair/stratum membership for each unit. Unmatched units
will have a value of |
weights |
a numeric vector of estimated matching weights. Unmatched and discarded units will have a weight of zero. |
model |
the fit object of
the model used to estimate propensity scores when |
X |
a data frame of covariates mentioned in |
call |
the |
info |
information on the matching method and distance measures used. |
estimand |
the argument supplied to
|
formula |
the |
treat |
a vector of treatment status converted to zeros (0) and ones (1) if not already in that format. |
distance |
a vector of distance
values (i.e., propensity scores) when |
discarded |
a logical vector denoting whether each observation was
discarded ( |
s.weights |
the vector of sampling weights supplied to
the |
exact |
a one-sided formula
containing the variables, if any, supplied to |
mahvars |
a one-sided formula containing the variables, if any,
supplied to |
obj |
when |
When method = "subclass"
, a matchit.subclass
object with the same
components as above except that match.matrix
is excluded and one
additional component, q.cut
, is included, containing a vector of the
distance measure cutpoints used to define the subclasses. See
method_subclass
for details.
Author(s)
Daniel Ho (dho@law.stanford.edu); Kosuke Imai (imai@harvard.edu); Gary King (king@harvard.edu); Elizabeth Stuart (estuart@jhsph.edu)
Version 4.0.0 update by Noah Greifer (noah.greifer@gmail.com)
References
Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, 15(3), 199–236. doi:10.1093/pan/mpl013
Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2011). MatchIt: Nonparametric Preprocessing for Parametric Causal Inference. Journal of Statistical Software, 42(8). doi:10.18637/jss.v042.i08
See Also
summary.matchit()
for balance assessment after matching, plot.matchit()
for plots of covariate balance and propensity score overlap after matching.
vignette("MatchIt")
for an introduction to matching with
MatchIt; vignette("matching-methods")
for descriptions of the
variety of matching methods and options available;
vignette("assessing-balance")
for information on assessing the
quality of a matching specification; vignette("estimating-effects")
for instructions on how to estimate treatment effects after matching; and
vignette("sampling-weights")
for a guide to using MatchIt with
sampling weights.
Examples
data("lalonde")
# Default: 1:1 NN PS matching w/o replacement
m.out1 <- matchit(treat ~ age + educ + race + nodegree +
married + re74 + re75, data = lalonde)
m.out1
summary(m.out1)
# 1:1 NN Mahalanobis distance matching w/ replacement and
# exact matching on married and race
m.out2 <- matchit(treat ~ age + educ + race + nodegree +
married + re74 + re75, data = lalonde,
distance = "mahalanobis", replace = TRUE,
exact = ~ married + race)
m.out2
summary(m.out2, un = TRUE)
# 2:1 NN Mahalanobis distance matching within caliper defined
# by a probit pregression PS
m.out3 <- matchit(treat ~ age + educ + race + nodegree +
married + re74 + re75, data = lalonde,
distance = "glm", link = "probit",
mahvars = ~ age + educ + re74 + re75,
caliper = .1, ratio = 2)
m.out3
summary(m.out3, un = TRUE)
# Optimal full PS matching for the ATE within calipers on
# PS, age, and educ
m.out4 <- matchit(treat ~ age + educ + race + nodegree +
married + re74 + re75, data = lalonde,
method = "full", estimand = "ATE",
caliper = c(.1, age = 2, educ = 1),
std.caliper = c(TRUE, FALSE, FALSE))
m.out4
summary(m.out4, un = TRUE)
# Subclassification on a logistic PS with 10 subclasses after
# discarding controls outside common support of PS
s.out1 <- matchit(treat ~ age + educ + race + nodegree +
married + re74 + re75, data = lalonde,
method = "subclass", distance = "glm",
discard = "control", subclass = 10)
s.out1
summary(s.out1, un = TRUE)