R: Estimate average causal effects by propensity score weighting

PSweight {PSweight}

R Documentation

Estimate average causal effects by propensity score weighting

Description

The function PSweight is used to estimate the average potential outcomes corresponding to each treatment group among the target population. The function currently implements the following types of weights: the inverse probability of treatment weights (IPW: target population is the combined population), average treatment effect among the treated weights (treated: target population is the population receiving a specified treatment), overlap weights (overlap: target population is the overlap population at clinical equipoise), matching weights (matching: target population is population obtained under 1:1 matching), entropy weights (entropy: target population is the population weighted by the entropy function). Augmented propensity score weighting estimators are also allowed, with propensity scores and outcome model estimates either estimated within the function, or supplied by external routines.

Usage

PSweight(
  ps.formula = NULL,
  ps.estimate = NULL,
  trtgrp = NULL,
  zname = NULL,
  yname,
  data,
  weight = "overlap",
  delta = 0,
  augmentation = FALSE,
  bootstrap = FALSE,
  R = 50,
  out.formula = NULL,
  out.estimate = NULL,
  family = "gaussian",
  ps.method = "glm",
  ps.control = list(),
  out.method = "glm",
  out.control = list()
)

Arguments

`ps.formula`	an object of class `formula` (or one that can be coerced to that class): a symbolic description of the propensity score model to be fitted. Additional details of model specification are given under "Details". This argument is optional if `ps.estimate` is not `NULL`.
`ps.estimate`	an optional matrix or data frame containing estimated (generalized) propensity scores for each observation. Typically, this is an N by J matrix, where N is the number of observations and J is the total number of treatment levels. Preferably, the column name of this matrix should match the name of treatment level, if column name is missing or there is a mismatch, the column names would be assigned according to alphabatic order of the treatment levels. A vector of propensity score estimates is also allowed in `ps.estimate`, in which case a binary treatment is implied and the input is regarded as the propensity to receive the last category of treatment by alphabatic order, unless otherwise stated by `trtgrp`.
`trtgrp`	an optional character defining the "treated" population for estimating the average treatment effect among the treated (ATT). Only necessary if `weight = "treated"`. This option can also be used to specify the treatment (in a two-treatment setting) when a vector argument is supplied for `ps.estimate`. Default value is the last group in the alphebatic order.
`zname`	an optional character specifying the name of the treatment variable in `data`.
`yname`	an optional character specifying the name of the outcome variable in `data`.
`data`	an optional data frame containing the variables in the propensity score model and outcome model (if augmented estimator is used). If not found in data, the variables are taken from `environment(formula)`.
`weight`	a character or vector of characters including the types of weights to be used. `"IPW"` specifies the inverse probability of treatment weights for estimating the average treatment effect among the combined population. `"treated"` specifies the weights for estimating the average treatment effect among the treated. `"overlap"` specifies the (generalized) overlap weights for estimating the average treatment effect among the overlap population, or population at clinical equipoise. `"matching"` specifies the matching weights for estimating the average treatment effect among the matched population (ATM). `"entropy"` specifies the entropy weights for the average treatment effect of entropy weighted population (ATEN). Default is `"overlap"`.
`delta`	trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming.
`augmentation`	logical. Indicate whether augmented weighting estimators should be used. Default is `FALSE`.
`bootstrap`	logical. Indaicate whether bootstrap is used to estimate the standard error of the point estimates. Default is `FALSE`.
`R`	an optional integer indicating number of bootstrap replicates. Default is `R = 50`.
`out.formula`	an object of class `formula` (or one that can be coerced to that class): a symbolic description of the outcome model to be fitted. Additional details of model specification are given under "Details". This argument is optional if `out.estimate` is not `NULL`.
`out.estimate`	an optional matrix or data frame containing estimated potential outcomes for each observation. Typically, this is an N by J matrix, where N is the number of observations and J is the total number of treatment levels. Preferably, the column name of this matrix should match the name of treatment level, if column name is missing or there is a mismatch, the column names would be assigned according to alphabatic order of the treatment levels, with a similar mechanism as in `ps.estimate`.
`family`	a description of the error distribution and link function to be used in the outcome model. Only required if `out.formula` is provided. Supported distributional families include `"gaussian" (link = identity)`, `"binomial" (link = logit)` and `"poisson" (link = log)`. See `family` in `glm` for more details. Default is `"gaussian"`.
`ps.method`	a character to specify the method for estimating propensity scores. `"glm"` is default, and `"gbm"` and `"SuperLearner"` are also allowed.
`ps.control`	a list to specify additional options when `method` is set to `"gbm"` or `"SuperLearner"`.
`out.method`	a character to specify the method for estimating the outcome regression model. `"glm"` is default, and `"gbm"` and `"SuperLearner"` are also allowed.
`out.control`	a list to specify additional options when `out.method` is set to `"gbm"` or `"SuperLearner"`.

Details

A typical form for ps.formula is treatment ~ terms where treatment is the treatment variable (identical to the variable name used to specify zname) and terms is a series of terms which specifies a linear predictor for treatment. Similarly, a typical form for out.formula is outcome ~ terms where outcome is the outcome variable (identical to the variable name used to specify yname) and terms is a series of terms which specifies a linear predictor for outcome. Both ps.formula and out.formula by default specify generalized linear models when ps.estimate and/or out.estimate is NULL. The argument ps.method and out.method allow user to choose model other than glm to fit the propensity score and outcome regression models for augmentation. Additional argument in the gbm() function can be supplied through the ps.control and out.control argument. Please refer to the user manual of the gbm package for all the allowed arguments. "SuperLearner" is also allowed in the ps.method and out.method arguments. Currently, the SuperLearner method only supports binary treatment with the default method set to "SL.glm". The estimation approach is default to "method.NNLS" for both propensity and outcome regression models. Prediction algorithm and other tuning parameters can also be passed throughps.control and out.control. Please refer to the user manual of the SuperLearner package for all the allowed specifications.

When comparing two treatments, ps.estimate can either be a vector or a two-column matrix of estimated propensity scores. If a vector is supplied, it is assumed to be the propensity scores to receive the treatment, and the treatment group corresponds to the last group in the alphebatic order, unless otherwise specified by trtgrp. When comparing multiple (J>=3) treatments, ps.estimate needs to be specified as an N by J matrix, where N indicates the number of observations, and J indicates the total number of treatments. This matrix specifies the estimated generalized propensity scores to receive each of the J treatments. In general, ps.estimate should have column names that indicate the level of the treatment variable, which should match the levels given in Z. If column names are empty or there is a mismatch, the column names will be created following the alphebatic order of values in Z, and the rightmost coulmn of ps.estimate is assumed to be the treatment group, when estimating ATT. trtgrp can also be used to specify the treatment group for estimating ATT. The same mechanism applies to out.estimate, except that the input for out.estimate must be an N by J matrix, where each row corresponds to the estimated potential outcomes (corresponding to each treatment) for each observation.

The argument zname and/or yname is required when ps.estimate and/or out.estimate is not NULL.

Current version of PSweight allows for five types of propensity score weights used to estimate ATE (IPW), ATT (treated) and ATO (overlap), ATM (matching) and ATEN (entropy). These weights are members of larger class of balancing weights defined in Li, Morgan, and Zaslavsky (2018). Specific definitions of these weights are provided in Li, Morgan, and Zaslavsky (2018), Li and Greene (2013), Zhou, Matsouaka and Thomas (2020). When there is a practical violation of the positivity assumption, delta defines the symmetric propensity score trimming rule following Crump et al. (2009). With multiple treatments, delta defines the multinomial trimming rule introduced in Yoshida et al. (2019). The overlap weights can also be considered as a data-driven continuous trimming strategy without specifying trimming rules, see Li, Thomas and Li (2019). Additional details on balancing weights and generalized overlap weights for multiple treatment groups are provided in Li and Li (2019).

If augmentation = TRUE, an augmented weighting estimator will be implemented. For binary treatments, the augmented weighting estimator is presented in Mao, Li and Greene (2018). For multiple treatments, the augmented weighting estimator is mentioned in Li and Li (2019), and additional details will appear in our ongoing work (Zhou et al. 2020+). When weight = "IPW", the augmented estimator is also referred to as a doubly-robust (DR) estimator.

When bootstrap = TRUE, the variance will be calculated by nonparametric bootstrap, with R bootstrap replications. The default of R is 50. Otherwise, the variance will be calculated using the sandwich variance formula obtained in the M-estimation framework.

Value

PSweight returns a PSweight object containing a list of the following values: estimated propensity scores, average potential outcomes corresponding to each treatment, variance-covariance matrix of the point estimates, the label for each treatment group, and estimates in each bootstrap replicate if bootstrap = TRUE. A summary of PSweight can be obtained with summary.PSweight.

trtgrp: a character indicating the treatment group.
propensity: a data frame of estimated propensity scores.
muhat: average potential outcomes by treatment groups, with reference to specific target populations.
covmu: variance-covariance matrix of muhat.
muboot: an optional list of point estimates in each bootstrap replicate bootstrap = TRUE.
group: a table of treatment group labels corresponding to the output point estimates muhat.

References

Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.

Li, L., Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9(2), 215-234.

Li, F., Morgan, K. L., Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390-400.

Mao, H., Li, L., Greene, T. (2019). Propensity score weighting analysis and treatment effect discovery. Statistical Methods in Medical Research, 28(8), 2439-2454.

Li, F., Thomas, L. E., Li, F. (2019). Addressing extreme propensity scores via the overlap weights. American Journal of Epidemiology, 188(1), 250-257.

Yoshida, K., Solomon, D.H., Haneuse, S., Kim, S.C., Patorno, E., Tedeschi, S.K., Lyu, H., Franklin, J.M., Stürmer, T., Hernández-Díaz, S. and Glynn, R.J. (2019). Multinomial extension of propensity score trimming methods: A simulation study. American Journal of Epidemiology, 188(3), 609-616.

Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.

Zhou, Y., Matsouaka, R. A., Thomas, L. (2020). Propensity score weighting under limited overlap and model misspecification. Statistical Methods in Medical Research, 29(12), 3721-3756.

Examples

data("psdata")
# the propensity and outcome models
ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6
out.formula<-Y~cov1+cov2+cov3+cov4+cov5+cov6

# without augmentation
ato1<-PSweight(ps.formula = ps.formula,yname = 'Y',data = psdata,weight = 'overlap')
summary(ato1)

# augmented weighting estimator, takes longer time to calculate sandwich variance
# ato2<-PSweight(ps.formula = ps.formula,yname = 'Y',data = psdata,
#              augmentation = TRUE,out.formula = out.formula,family = 'gaussian',weight = 'overlap')
# summary(ato2)

[Package PSweight version 1.2.0 Index]