| SumStat {PSweight} | R Documentation |
Calculate summary statistics for propensity score weighting
Description
SumStat is used to generate distributional plots of the estimated propensity scores and balance
diagnostics after propensity score weighting.
Usage
SumStat(
ps.formula = NULL,
ps.estimate = NULL,
trtgrp = NULL,
Z = NULL,
covM = NULL,
zname = NULL,
xname = NULL,
data = NULL,
weight = "overlap",
delta = 0,
method = "glm",
ps.control = list()
)
Arguments
ps.formula |
an object of class |
ps.estimate |
an optional matrix or data frame containing estimated (generalized) propensity scores for each observation. Typically, this is an N by J matrix, where N is the number of observations and J is the total number of treatment levels. Preferably, the column names of this matrix should match the names of treatment level, if column names are missing or there is a mismatch, the column names would be assigned according to the alphabatic order of treatment levels. A vector of propensity score estimates is also allowed in |
trtgrp |
an optional character defining the "treated" population for estimating the average treatment effect among the treated (ATT). Only necessary if |
Z |
an optional vector specifying the values of treatment, only necessary when the covariate matrix |
covM |
an optional covariate matrix or data frame including covariates, their interactions and higher-order terms. When the covariate matrix |
zname |
an optional character specifying the name of the treatment variable in |
xname |
an optional vector of characters including the names of covariates in |
data |
an optional data frame containing the variables in the propensity score model. If not found in data, the variables are taken from |
weight |
a character or vector of characters including the types of weights to be used. |
delta |
trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming. |
method |
a character to specify the method for estimating propensity scores. |
ps.control |
a list to specify additional options when |
Details
A typical form for ps.formula is treatment ~ terms where treatment is the treatment
variable (identical to the variable name used to specify zname) and terms is a series of terms
which specifies a linear predictor for treatment. ps.formula specifies logistic or multinomial logistic
models for estimating the propensity scores, when ps.estimate is NULL.
When comparing two treatments, ps.estimate can either be a vector or a two-column matrix of estimated
propensity scores. If a vector is supplied, it is assumed to be the propensity scores to receive the treatment, and
the treatment group corresponds to the last group in the alphebatic order, unless otherwise specified by trtgrp.
When comparing multiple (J>=3) treatments, ps.estimate needs to be specified as an N by J matrix,
where N indicates the number of observations, and J indicates the total number of treatments.
This matrix specifies the estimated generalized propensity scores to receive each of the J treatments.
In general, ps.estimate should have column names that indicate the level of the treatment variable,
which should match the levels given in Z.
If column names are empty or there is a mismatch, the column names will be created following
the alphebatic order of treatmentlevels. The rightmost coulmn of ps.estimate is then assumed
to be the treatment group when estimating ATT ("treated"). trtgrp can also be used to specify the treatment
group for estimating ATT.
To generate balance statistics, one can directly specify Z and covM to indicate the treatment levels and
covariate matrix. Alternatively, one can supply data, zname, and xname to indicate the
same information. When both are specified, the function will prioritize inputs from Z and covM.
When ps.estimate is not NULL, argument zname.
Current version of PSweight allows for five types of propensity score weights used to estimate ATE ("IPW"), ATT ("treated"), and
ATO("overlap"), ATM ("matching") and ATEN ("entropy"). These weights are members of a larger class of balancing weights defined in Li, Morgan, and Zaslavsky (2018).
When there is a practical violation of the positivity assumption, delta defines the symmetric
propensity score trimming rule following Crump et al. (2009). With multiple treatments, delta defines the
multinomial trimming rule introduced in Yoshida et al. (2019). The overlap weights can also be considered as
a data-driven continuous trimming strategy without specifying trimming rules, see Li, Thomas and Li (2019).
Additional details on balancing weights and generalized overlap weights for multiple treatment groups are provided in
Li and Li (2019). For details about matching weights and entropy weights, please refer to Li and Greene (2013) and Zhou, Matsouaka and Thomas (2020).
"glm" is the default method for propensity score estimation. Logistic regression will be used for binary outcomes,
and multinomial logistic regression will be used for outcomes with more than two categories. The alternative method option of "gbm" serves as an API to call the gbm() function from the
gbm package. Additional argument in the gbm() function can be supplied through the ps.control=list() argument in SumStat(). Please refer to the user manual of the gbm package for all the
allowed arguments. Currently, models for binary or multinomial treatment will be automatically chosen based on the number of treatment categories.
"SuperLearner" is also allowed in the method argument to pass the propensity score estimation to the SuperLearner() function in SuperLearner package.
Currently, the SuperLearner method only supports binary treatment with the default method set to "SL.glm". The estimation approach is default to "method.NNLS" in the SumStat() function.
Prediction algorithm and other tuning parameters can also be passed through ps.control=list() to SumStat(). Please refer to the user manual of the SuperLearner package for all the allowed specifications.
Value
SumStat returns a SumStat object including a list of the following value:
treatment group, propensity scores, fitted propensity model, propensity score weights, effective sample sizes,
and balance statistics. A summary of SumStat can be obtained with summary.SumStat.
trtgrpa character indicating the treatment group.
propensitya data frame of estimated propensity scores.
ps.fitObjectsthe fitted propensity model details
ps.weightsa data frame of propensity score weights.
essa table of effective sample sizes. This serves as a conservative measure to characterize the variance inflation or precision loss due to weighting, see Li and Li (2019).
unweighted.sumstatA list of tables including covariate means and variances by treatment group and standardized mean differences.
ATE.sumstatIf
"IPW"is included inweight, this is a list of summary statistics using inverse probability weighting.ATT.sumstatIf
"treated"is included inweight, this is a list of summary statistics using the ATT weights.ATO.sumstatIf
"overlap"is included inweight, this is a list of summary statistics using the overlap weights.ATM.sumstatIf
"matching"is included inweight, this is a list of summary statistics using the matching weights.ATEN.sumstatIf
"entropy"is included inweight, this is a list of summary statistics using the entropy weights.trimIf
delta > 0, this is a table summarizing the number of observations before and after trimming.
References
Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
Greenwell B., Boehmke B.,Cunningham J, GBM Developers (2020) gbm: Generalized Boosted Regression Models. Cran: https://cran.r-project.org/web/packages/gbm/index.html
Li, L., Greene, T. (2013). A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9(2), 215-234.
Li, F., Morgan, K. L., Zaslavsky, A. M. (2018). Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521), 390-400.
Li, F., Thomas, L. E., Li, F. (2019). Addressing extreme propensity scores via the overlap weights. American Journal of Epidemiology, 188(1), 250-257.
Polley E., LeDell E., Kennedy C., Lendle S., van der Laan M. (2019) SuperLearner: Super Learner Prediction. Cran: https://cran.r-project.org/web/packages/SuperLearner/index.html
Yoshida, K., Solomon, D.H., Haneuse, S., Kim, S.C., Patorno, E., Tedeschi, S.K., Lyu, H., Franklin, J.M., Stürmer, T., Hernández-Díaz, S. and Glynn, R.J. (2019). Multinomial extension of propensity score trimming methods: A simulation study. American Journal of Epidemiology, 188(3), 609-616.
Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments. The Annals of Applied Statistics, 13(4), 2389-2415.
Zhou, Y., Matsouaka, R. A., Thomas, L. (2020). Propensity score weighting under limited overlap and model misspecification. Statistical Methods in Medical Research (Online)
Examples
data("psdata")
# the propensity model
ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6
# using SumStat to estimate propensity scores
msstat <- SumStat(ps.formula, trtgrp="2", data=psdata,
weight=c("IPW","overlap","treated","entropy","matching"))
#summary(msstat)
# importing user-supplied propensity scores "e.h"
# fit <- nnet::multinom(formula=ps.formula, data=psdata, maxit=500, trace=FALSE)
# e.h <- fit$fitted.values
# varname <- c("cov1","cov2","cov3","cov4","cov5","cov6")
# msstat0 <- SumStat(zname="trt", xname=varname, data=psdata, ps.estimate=e.h,
# trtgrp="2", weight=c("IPW","overlap","treated","entropy","matching"))
# summary(msstat0)