lmw {lmw} | R Documentation |
Compute linear regression-implied weights
Description
Computes the weights implied by a linear outcome regression model that would estimate a weighted difference in outcome means equal to the covariate-adjusted treatment effect resulting from the supplied regression model.
Usage
lmw(
formula,
data = NULL,
estimand = "ATE",
method = "URI",
treat = NULL,
base.weights = NULL,
s.weights = NULL,
dr.method = "WLS",
obj = NULL,
fixef = NULL,
target = NULL,
target.weights = NULL,
contrast = NULL,
focal = NULL
)
Arguments
formula |
a one-sided formula with the treatment and covariates on the right-hand side corresponding to the outcome regression model to be fit. The outcome variable is not involved in computing the weights and does not need to be specified. See Details for how this formula is interpreted in light of other options. |
data |
a data frame containing the variables named in |
estimand |
the estimand of interest, which determines how covariates
are centered. Should be one of |
method |
the method used to estimate the weights; either |
treat |
the name of the treatment variable in |
base.weights |
a vector of base weights. See Details. If omitted and
|
s.weights |
a vector of sampling weights. See Details. If omitted and
|
dr.method |
the method used to incorporate the |
obj |
a |
fixef |
optional; a string or one-sided formula containing the name of
the fixed effects variable in |
target |
a list or data frame containing the target values for each
covariate included in |
target.weights |
a vector of sampling weights to be applied to
|
contrast |
for multi-category treatments with |
focal |
the level of the treatment variable to be considered "focal"
(i.e., the "treated" level when |
Details
formula
is interpreted differently depending on whether method
is "URI"
or "MRI"
. When method = "URI"
, the formula is
taken literally as the right-hand side of the outcome model formula. The
only difference is that the covariates will be centered based on the
argument to estimand
(see below). When method = "MRI"
, all
references to the treatment are removed (i.e., covariate interactions with
treatment become covariate main effects if not already present), and the new
formula is taken as the right-hand side of the model formula fit within each
treatment group. This is equivalent to allowing all covariates to have both
main effects and interactions with treatment after centering the covariates
based on the argument to estimand
. Allowing the treatment to interact
with all covariates with method = "URI"
is equivalent to specifying
method = "MRI"
, and, for binary treatments, the returned weights will
be the same when fixef = NULL
.
When any treatment-by-covariate interactions are present in formula
or when method = "MRI"
, covariates are centered at specific values to
ensure the resulting weights correspond to the desired estimand as supplied
to the estimand
argument. For the ATE, the covariates are centered at
their means in the full sample. For the ATT and ATC, the covariates are
centered at their means in the treatment or control group (i.e., the
focal
group), respectively. For the CATE, the covariates are centered
according to the argument supplied to target
(see below). Note that
when covariate-by-covariate interactions are present, they will be centered
after computing the interaction rather than the interaction being computed
on the centered covariates unless estimand = "CATE"
, in which case
the covariates will be centered at the values specified in target
prior to involvement in interactions.
Estimating a CATE
When estimand = "CATE"
, target
can be supplied either as a
single target profile (i.e., a list or a data frame with one row) or as a
target dataset, potentially with its own sampling weights, which are
supplied to target.weights
. The variables included in target
must correspond to all the named covariates in formula
; for
example, if formula = ~ X1 + log(X1) + X2 + X1:X2
, values in
target
must be given for X1
and X2
, but not
log(X1)
or X1:X2
. To choose a target profile value for a
factor corresponding to a proportion (e.g., a target value of .5 for a
variable like sex
indicating a target population with a 50-50 sex
split), the factor variable must be split into a numeric variable
beforehand, e.g., using model.matrix()
or
cobalt::splitfactor()
. target
values cannot be given to
variables specified using $
, [[]]
, or []
(e.g.,
data$X1
), so an error will be thrown if they are used in
formula
. When a target dataset is supplied, covariates will be
centered at their means in the (target.weights
-weighted) target
dataset.
Base weights and sampling weights
Base weights (base.weights
) and sampling weights (s.weights
)
are similar in that they both involve combining weights with an outcome
regression model. However, they differ in a few ways. Sampling weights are
primarily used to adjust the target population; when the outcome model is
fit, it is fit using weighted least squares, and when target balance is
assessed, it is assessed using the sampling weighted population as the
target population. Centering of covariates in the outcome model is done
using the sampling weighted covariate means. Base weights are primarily used
to offer a second level of balancing beyond the implied regression weights;
they can be incorporated into the effect estimate either using weighted
least squares or using the augmented inverse probability weighting (AIPW)
estimator. Base weights do not change the target population, so when target
balance is assessed, it is assessed using the unweighted population as the
target population.
Some forms of weights both change the target population and provide an extra
layer of balancing, like propensity score weights that target estimands
other than the ATT, ATC, or ATE (e.g., overlap weights), or matching weights
where the target population is defined by the matching (e.g., matching with
a caliper, cardinality matching, or coarsened exact matching). Because these
weights change the target population, they should be supplied to
s.weights
to ensure covariates are appropriately centered. When there
are no treatment-by-covariate interactions and method = "URI"
,
whether weights are supplied to base.weights
or s.weights
will
not matter for the estimation of the weights but will affect the target
population in balance assessment.
When both base.weights
and s.weights
are supplied, e.g., when
the base weights are the result of a propensity score model fit with
sampling weights, it is assumed the base weights do not incorporate the
sampling weights; that is, it is assumed that to estimate a treatment effect
without regression adjustment, the base weights and the sampling
weights would have to be multiplied together. This is true, for example, for
the weights in a matchit
or weightit
object (see below) but
not for weights in the output of MatchIt::match.data()
unless called
with include.s.weights = FALSE
or weights resulting from
CBPS::CBPS()
.
Regression after using MatchIt or WeightIt
Regression weights can be computed in a matched or weighted sample by
supplying a matchit
or weightit
object (from MatchIt or
WeightIt, respectively) to the obj
argument of lmw()
.
The estimand, focal group (if any), base weights, and sampling weights (if any) will be taken from
the supplied object and used in the calculation of the implied regression
weights, unless these have been supplied separately to lmw()
. The
weights
component of the supplied object containing the matching or
balancing weights will be passed to base.weights
and the
s.weights
component will be passed to s.weights
. Arguments
supplied to lmw()
will take precedence over the corresponding
components in the obj
object.
Multi-category treatments
There are a few differences when the
treatment has multiple (i.e., more than 2) categories. If estimand
is
"ATT"
or "ATC"
, an argument should be supplied to focal
identifying which group is the treated or control (i.e., "focal") group,
respectively.
The key difference, though, is when method = "URI"
, because in this
case the contrast between each pair of treatment groups has its own weights
and its own implied target population. Because lmw()
only produces
one set of weights, an argument must be supplied to contrast
identifying which groups are to be used as the contrast for computing the
weights. In addition, to compute the treatment effect corresponding to the
chosen contrast as a weighted difference in outcome means, the difference
must be taken between the weighted mean of the non-reference group and the
weighted mean of all other groups combined, rather than simply the
weighted mean of the reference group.
The implication of this is that contrast statistics computed in the weighted
sample involve all units, even those not in the contrasted groups, whereas
statistics computed in the unweighted sample only involve units in the
contrasted groups. See summary.lmw()
for more information on
assessing balance using the regression weights for multi-category
treatments. Given these complications, it is generally best to use
method = "MRI"
with multi-category treatments.
Fixed effects
A fixed effects variable can be supplied to the
fixef
argument. This is equivalent to adding the fixed effects
variable as a predictor that does not interact with the treatment or any
other covariate. The difference is that computation is much faster when the
fixed effect has many levels because demeaning is used rather than including
the fixed effect variable as a collection of dummy variables. When using
URI, the weights will be the same regardless of whether the fixed effect
variable is included as a covariate or supplied to fixef
; when using
MRI, results will differ because the fixed effect variable does not interact
with treatment. The fixed effects variable will not appear in the
summary.lmw()
output (but can be added using addlvariables
argument) or in the model output of lmw_est()
or
summary.lmw_est()
. Because it does not interact with the
treatment, the distribution of the fixed effect variable may not correspond
to the target population, so caution should be used if it is expected the
treatment effect varies across levels of this variable (in which case it
should be included as a predictor). Currently only one fixed effect variable
is allowed.
Value
An lmw
object, which contains the following components:
treat |
the treatment variable, given as a factor. |
weights |
the computed implied regression weights. |
covs |
a data frame containing the covariates included the model formula. |
estimand |
the requested estimand. |
method |
the method used to estimate the weights
( |
base.weights |
the weights supplied to
|
s.weights |
the weights supplied to
|
dr.method |
when |
call |
the
original call to |
fixef |
the fixed effects variable if
supplied to |
formula |
the model formula. |
target |
the supplied target profile or dataset when |
contrast |
the contrasted treatment groups. |
focal |
the focal
treatment level when |
References
Chattopadhyay, A., & Zubizarreta, J. R. (2023). On the implied weights of linear regression for causal inference. Biometrika, 110(3), 615–629. doi:10.1093/biomet/asac058
See Also
summary.lmw()
for summarizing balance and
representativeness; plot.lmw()
for plotting features of the
weights; lmw_est()
for estimating treatment effects from
lmw
objects; influence.lmw()
for influence measures;
lm()
for fitting standard regression models.
Examples
data("lalonde")
# URI regression for ATT
lmw.out1 <- lmw(~ treat + age + education + race + married +
nodegree + re74 + re75, data = lalonde,
estimand = "ATT", method = "URI",
treat = "treat")
lmw.out1
summary(lmw.out1)
# MRI regression for ATT
lmw.out2 <- lmw(~ treat + age + education + race + married +
nodegree + re74 + re75, data = lalonde,
estimand = "ATT", method = "MRI",
treat = "treat")
lmw.out2
summary(lmw.out2)
# MRI regression for ATT after propensity score matching
m.out <- MatchIt::matchit(treat ~ age + education + race +
married + nodegree + re74 + re75,
data = lalonde, method = "nearest",
estimand = "ATT")
lmw.out3 <- lmw(~ treat + age + education + race + married +
nodegree + re74 + re75, data = lalonde,
method = "MRI", treat = "treat", obj = m.out)
lmw.out3
summary(lmw.out3)
# MRI regression for CATE with given target profile
target.prof <- list(age = 25, education = 11, race = "black",
married = 0, nodegree = 1, re74 = 0,
re75 = 0)
lmw.out4 <- lmw(~ treat + age + education + race + married +
nodegree + re74 + re75, data = lalonde,
estimand = "CATE", method = "MRI",
treat = "treat", target = target.prof)
lmw.out4
summary(lmw.out4)
# MRI regression for CATE with given target dataset (in
# this case, will give the same as with estimand = "ATT")
target.data <- subset(lalonde, treat == 1)
lmw.out4 <- lmw(~ treat + age + education + race + married +
nodegree + re74 + re75, data = lalonde,
estimand = "CATE", method = "MRI",
treat = "treat", target = target.data)
lmw.out4
summary(lmw.out4)
# URI regression with fixed effects for 'race'
lmw.out5 <- lmw(~ treat + age + education + married +
nodegree + re74 + re75, data = lalonde,
method = "URI", treat = "treat",
fixef = ~race)
lmw.out5
# Produces the same weights as when included as a covariate
all.equal(lmw.out1$weights, lmw.out5$weights)
# MRI for a multi-category treatment, ATT with 1 as the focal
# group
lmw.out6 <- lmw(~ treat_multi + age + education + race + married +
nodegree + re74 + re75, data = lalonde,
estimand = "ATT", method = "MRI",
treat = "treat_multi", focal = "1")
lmw.out6
summary(lmw.out6)
# URI for a multi-category treatment; need to specify
# contrast because only two groups can be compared at
# a time
lmw.out7 <- lmw(~ treat_multi + age + education + race + married +
nodegree + re74 + re75, data = lalonde,
estimand = "ATE", method = "URI",
treat = "treat_multi", contrast = c("2", "3"))
lmw.out7
summary(lmw.out7)