R: Compute linear regression-implied weights

lmw {lmw}

R Documentation

Compute linear regression-implied weights

Description

Computes the weights implied by a linear outcome regression model that would estimate a weighted difference in outcome means equal to the covariate-adjusted treatment effect resulting from the supplied regression model.

Usage

lmw(
  formula,
  data = NULL,
  estimand = "ATE",
  method = "URI",
  treat = NULL,
  base.weights = NULL,
  s.weights = NULL,
  dr.method = "WLS",
  obj = NULL,
  fixef = NULL,
  target = NULL,
  target.weights = NULL,
  contrast = NULL,
  focal = NULL
)

Arguments

`formula`	a one-sided formula with the treatment and covariates on the right-hand side corresponding to the outcome regression model to be fit. The outcome variable is not involved in computing the weights and does not need to be specified. See Details for how this formula is interpreted in light of other options.
`data`	a data frame containing the variables named in `formula` and `treat`.
`estimand`	the estimand of interest, which determines how covariates are centered. Should be one of `"ATE"` for the average treatment effect, `"ATT"` for the average treatment effect in the treated, `"ATC"` for the average treatment effect in the control, or `"CATE"` for the conditional average treatment effect. When `estimand = "CATE"`, an argument to `target` must be supplied. This argument also affects what `summary.lmw()` considers to be the target population. Default is `"ATE"` unless `obj` is specified, in which case it takes its value from the supplied object.
`method`	the method used to estimate the weights; either `"URI"` (the default) for uni-regression imputation weights, where a single model is fit to the whole dataset, or `"MRI"` for multi-regression imputation, where the model is fit separately in the treatment groups. This affects the interpretation of `formula`. See Details.
`treat`	the name of the treatment variable in `data`. If unspecified, the first variable present in `formula` will be taken as the treatment variable with a message. See Details.
`base.weights`	a vector of base weights. See Details. If omitted and `obj` is specified, the weights from the supplied object will be used. Can be supplied as a numeric vector, a string containing the name of the variable in `data` containing the base weights, or the unquoted name of the variable in `data` containing the base weights.
`s.weights`	a vector of sampling weights. See Details. If omitted and `obj` is specified, the sampling weights from the supplied object will be used. Can be supplied as a numeric vector, a string containing the name of the variable in `data` containing the sampling weights, or the unquoted name of the variable in `data` containing the sampling weights.
`dr.method`	the method used to incorporate the `base.weights` into a doubly-robust estimator. Can be one of `"WLS"` for weighted least squares (`"IPWRA"` is an allowable alias) or `"AIPW"` for augmented inverse probability weighting. Ignored when `base.weights` is `NULL`.
`obj`	a `matchit` or `weightit` object corresponding to the matched or weighted sample in which the implied outcome regression would take place. See Details.
`fixef`	optional; a string or one-sided formula containing the name of the fixed effects variable in `data`. See Details. Cannot be used with `dr.method = "AIPW"`.
`target`	a list or data frame containing the target values for each covariate included in `formula`. Ignored with a warning when `estimand` is not `"CATE"`. See Details.
`target.weights`	a vector of sampling weights to be applied to `target` when supplied as a data frame. Ignored with a warning when `estimand` is not `"CATE"`. See Details.
`contrast`	for multi-category treatments with `method = "URI"`, a vector containing the names or indices of the two treatment levels to be contrasted (since in this case the weights depend on the specific contrast). See Details.
`focal`	the level of the treatment variable to be considered "focal" (i.e., the "treated" level when `estimand = "ATT"` or the control level when `estimand = "ATC"`). Ignored when `estimand` is `"ATE"` or `"CATE"`. Otherwise, if unspecified, the second value of `contrast` will be considered focal when `estimand = "ATT"` and the first value of `contrast` will be considered focal when `estimand = "ATC"`. For binary treatments, this generally does not need to be supplied. See Details.

Details

formula is interpreted differently depending on whether method is "URI" or "MRI". When method = "URI", the formula is taken literally as the right-hand side of the outcome model formula. The only difference is that the covariates will be centered based on the argument to estimand (see below). When method = "MRI", all references to the treatment are removed (i.e., covariate interactions with treatment become covariate main effects if not already present), and the new formula is taken as the right-hand side of the model formula fit within each treatment group. This is equivalent to allowing all covariates to have both main effects and interactions with treatment after centering the covariates based on the argument to estimand. Allowing the treatment to interact with all covariates with method = "URI" is equivalent to specifying method = "MRI", and, for binary treatments, the returned weights will be the same when fixef = NULL.

When any treatment-by-covariate interactions are present in formula or when method = "MRI", covariates are centered at specific values to ensure the resulting weights correspond to the desired estimand as supplied to the estimand argument. For the ATE, the covariates are centered at their means in the full sample. For the ATT and ATC, the covariates are centered at their means in the treatment or control group (i.e., the focal group), respectively. For the CATE, the covariates are centered according to the argument supplied to target (see below). Note that when covariate-by-covariate interactions are present, they will be centered after computing the interaction rather than the interaction being computed on the centered covariates unless estimand = "CATE", in which case the covariates will be centered at the values specified in target prior to involvement in interactions.

Estimating a CATE

When estimand = "CATE", target can be supplied either as a single target profile (i.e., a list or a data frame with one row) or as a target dataset, potentially with its own sampling weights, which are supplied to target.weights. The variables included in target must correspond to all the named covariates in formula; for example, if formula = ~ X1 + log(X1) + X2 + X1:X2, values in target must be given for X1 and X2, but not log(X1) or X1:X2. To choose a target profile value for a factor corresponding to a proportion (e.g., a target value of .5 for a variable like sex indicating a target population with a 50-50 sex split), the factor variable must be split into a numeric variable beforehand, e.g., using model.matrix() or cobalt::splitfactor(). target values cannot be given to variables specified using $, ⁠[[]]⁠, or ⁠[]⁠ (e.g., data$X1), so an error will be thrown if they are used in formula. When a target dataset is supplied, covariates will be centered at their means in the (target.weights-weighted) target dataset.

Base weights and sampling weights

Base weights (base.weights) and sampling weights (s.weights) are similar in that they both involve combining weights with an outcome regression model. However, they differ in a few ways. Sampling weights are primarily used to adjust the target population; when the outcome model is fit, it is fit using weighted least squares, and when target balance is assessed, it is assessed using the sampling weighted population as the target population. Centering of covariates in the outcome model is done using the sampling weighted covariate means. Base weights are primarily used to offer a second level of balancing beyond the implied regression weights; they can be incorporated into the effect estimate either using weighted least squares or using the augmented inverse probability weighting (AIPW) estimator. Base weights do not change the target population, so when target balance is assessed, it is assessed using the unweighted population as the target population.

Some forms of weights both change the target population and provide an extra layer of balancing, like propensity score weights that target estimands other than the ATT, ATC, or ATE (e.g., overlap weights), or matching weights where the target population is defined by the matching (e.g., matching with a caliper, cardinality matching, or coarsened exact matching). Because these weights change the target population, they should be supplied to s.weights to ensure covariates are appropriately centered. When there are no treatment-by-covariate interactions and method = "URI", whether weights are supplied to base.weights or s.weights will not matter for the estimation of the weights but will affect the target population in balance assessment.

When both base.weights and s.weights are supplied, e.g., when the base weights are the result of a propensity score model fit with sampling weights, it is assumed the base weights do not incorporate the sampling weights; that is, it is assumed that to estimate a treatment effect without regression adjustment, the base weights and the sampling weights would have to be multiplied together. This is true, for example, for the weights in a matchit or weightit object (see below) but not for weights in the output of MatchIt::match.data() unless called with include.s.weights = FALSE or weights resulting from CBPS::CBPS().

Regression after using MatchIt or WeightIt

Regression weights can be computed in a matched or weighted sample by supplying a matchit or weightit object (from MatchIt or WeightIt, respectively) to the obj argument of lmw(). The estimand, focal group (if any), base weights, and sampling weights (if any) will be taken from the supplied object and used in the calculation of the implied regression weights, unless these have been supplied separately to lmw(). The weights component of the supplied object containing the matching or balancing weights will be passed to base.weights and the s.weights component will be passed to s.weights. Arguments supplied to lmw() will take precedence over the corresponding components in the obj object.

Multi-category treatments

There are a few differences when the treatment has multiple (i.e., more than 2) categories. If estimand is "ATT" or "ATC", an argument should be supplied to focal identifying which group is the treated or control (i.e., "focal") group, respectively.

The key difference, though, is when method = "URI", because in this case the contrast between each pair of treatment groups has its own weights and its own implied target population. Because lmw() only produces one set of weights, an argument must be supplied to contrast identifying which groups are to be used as the contrast for computing the weights. In addition, to compute the treatment effect corresponding to the chosen contrast as a weighted difference in outcome means, the difference must be taken between the weighted mean of the non-reference group and the weighted mean of all other groups combined, rather than simply the weighted mean of the reference group.

The implication of this is that contrast statistics computed in the weighted sample involve all units, even those not in the contrasted groups, whereas statistics computed in the unweighted sample only involve units in the contrasted groups. See summary.lmw() for more information on assessing balance using the regression weights for multi-category treatments. Given these complications, it is generally best to use method = "MRI" with multi-category treatments.

Fixed effects

A fixed effects variable can be supplied to the fixef argument. This is equivalent to adding the fixed effects variable as a predictor that does not interact with the treatment or any other covariate. The difference is that computation is much faster when the fixed effect has many levels because demeaning is used rather than including the fixed effect variable as a collection of dummy variables. When using URI, the weights will be the same regardless of whether the fixed effect variable is included as a covariate or supplied to fixef; when using MRI, results will differ because the fixed effect variable does not interact with treatment. The fixed effects variable will not appear in the summary.lmw() output (but can be added using addlvariables argument) or in the model output of lmw_est() or summary.lmw_est(). Because it does not interact with the treatment, the distribution of the fixed effect variable may not correspond to the target population, so caution should be used if it is expected the treatment effect varies across levels of this variable (in which case it should be included as a predictor). Currently only one fixed effect variable is allowed.

Value

An lmw object, which contains the following components:

`treat`	the treatment variable, given as a factor.
`weights`	the computed implied regression weights.
`covs`	a data frame containing the covariates included the model formula.
`estimand`	the requested estimand.
`method`	the method used to estimate the weights (`"URI"` or `"MRI"`).
`base.weights`	the weights supplied to `base.weights`.
`s.weights`	the weights supplied to `s.weights`.
`dr.method`	when `base.weights` are supplied, the method for computing the doubly-robust weights.
`call`	the original call to `lmw()`.
`fixef`	the fixed effects variable if supplied to `fixef`.
`formula`	the model formula.
`target`	the supplied target profile or dataset when `estimand = "CATE"`, after some initial processing. The `"target.weights"` attribute contains the `target.weights` if supplied.
`contrast`	the contrasted treatment groups.
`focal`	the focal treatment level when `estimand` is `"ATT"` or `"ATC"`.

References

Chattopadhyay, A., & Zubizarreta, J. R. (2023). On the implied weights of linear regression for causal inference. Biometrika, 110(3), 615–629. doi:10.1093/biomet/asac058

Examples

data("lalonde")

# URI regression for ATT
lmw.out1 <- lmw(~ treat + age + education + race + married +
                  nodegree + re74 + re75, data = lalonde,
                estimand = "ATT", method = "URI",
                treat = "treat")
lmw.out1
summary(lmw.out1)

# MRI regression for ATT
lmw.out2 <- lmw(~ treat + age + education + race + married +
                  nodegree + re74 + re75, data = lalonde,
                estimand = "ATT", method = "MRI",
                treat = "treat")
lmw.out2
summary(lmw.out2)


# MRI regression for ATT after propensity score matching
m.out <- MatchIt::matchit(treat ~ age + education + race +
                            married + nodegree + re74 + re75,
                          data = lalonde, method = "nearest",
                          estimand = "ATT")
lmw.out3 <- lmw(~ treat + age + education + race + married +
                  nodegree + re74 + re75, data = lalonde,
                method = "MRI", treat = "treat", obj = m.out)
lmw.out3
summary(lmw.out3)

# MRI regression for CATE with given target profile
target.prof <- list(age = 25, education = 11, race = "black",
                    married = 0, nodegree = 1, re74 = 0,
                    re75 = 0)
lmw.out4 <- lmw(~ treat + age + education + race + married +
                  nodegree + re74 + re75, data = lalonde,
                estimand = "CATE", method = "MRI",
                treat = "treat", target = target.prof)
lmw.out4
summary(lmw.out4)

# MRI regression for CATE with given target dataset (in
# this case, will give the same as with estimand = "ATT")
target.data <- subset(lalonde, treat == 1)
lmw.out4 <- lmw(~ treat + age + education + race + married +
                  nodegree + re74 + re75, data = lalonde,
                estimand = "CATE", method = "MRI",
                treat = "treat", target = target.data)
lmw.out4
summary(lmw.out4)

# URI regression with fixed effects for 'race'
lmw.out5 <- lmw(~ treat + age + education + married +
                  nodegree + re74 + re75, data = lalonde,
                method = "URI", treat = "treat",
                fixef = ~race)
lmw.out5

# Produces the same weights as when included as a covariate
all.equal(lmw.out1$weights, lmw.out5$weights)

# MRI for a multi-category treatment, ATT with 1 as the focal
# group
lmw.out6 <- lmw(~ treat_multi + age + education + race + married +
                  nodegree + re74 + re75, data = lalonde,
                estimand = "ATT", method = "MRI",
                treat = "treat_multi", focal = "1")
lmw.out6
summary(lmw.out6)

# URI for a multi-category treatment; need to specify
# contrast because only two groups can be compared at
# a time
lmw.out7 <- lmw(~ treat_multi + age + education + race + married +
                  nodegree + re74 + re75, data = lalonde,
                estimand = "ATE", method = "URI",
                treat = "treat_multi", contrast = c("2", "3"))
lmw.out7
summary(lmw.out7)

[Package lmw version 0.0.2 Index]