medweight {causalweight} | R Documentation |
Causal mediation analysis based on inverse probability weighting with optional sample selection correction.
Description
Causal mediation analysis (evaluation of natural direct and indirect effects) based on weighting by the inverse of treatment propensity scores as suggested in Huber (2014) and Huber and Solovyeva (2018).
Usage
medweight(
y,
d,
m,
x,
w = NULL,
s = NULL,
z = NULL,
selpop = FALSE,
ATET = FALSE,
trim = 0.05,
logit = FALSE,
boot = 1999,
cluster = NULL
)
Arguments
y |
Dependent variable, must not contain missings. |
d |
Treatment, must be binary (either 1 or 0), must not contain missings. |
m |
Mediator(s), may be a scalar or a vector, must not contain missings. |
x |
Pre-treatment confounders of the treatment, mediator, and/or outcome, must not contain missings. |
w |
Post-treatment confounders of the mediator and the outcome. Default is |
s |
Optional selection indicator. Must be one if |
z |
Optional instrumental variable(s) for selection |
selpop |
Only to be used if both |
ATET |
If FALSE, the average treatment effect (ATE) and the corresponding direct and indirect effects are estimated. If TRUE, the average treatment effect on the treated (ATET) and the corresponding direct and indirect effects are estimated. Default is FALSE. |
trim |
Trimming rule for discarding observations with extreme propensity scores. In the absence of post-treatment confounders (w=NULL), observations with Pr(D=1|M,X)< |
logit |
If FALSE, probit regression is used for propensity score estimation. If TRUE, logit regression is used. Default is FALSE. |
boot |
Number of bootstrap replications for estimating standard errors. Default is 1999. |
cluster |
A cluster ID for block or cluster bootstrapping when units are clustered rather than iid. Must be numerical. Default is NULL (standard bootstrap without clustering). |
Details
Estimation of causal mechanisms (natural direct and indirect effects) of a binary treatment under a selection on observables assumption assuming that all confounders of the treatment and the mediator, the treatment and the outcome, or the mediator and the outcome are observed. Units are weighted by the inverse of their conditional treatment propensities given the mediator and/or observed confounders, which are estimated by probit or logit regression.
The form of weighting depends on whether the observed confounders are exclusively pre-treatment (x
), or also contain post-treatment confounders of the mediator and the outcome (w
). In the latter case, only partial indirect effects (from d
to m
to y
) can be estimated that exclude any causal paths from d
to w
to m
to y
, see the discussion in Huber (2014). Standard errors are obtained by bootstrapping the effects.
In the absence of post-treatment confounders (such that w
is NULL
), defining s
allows correcting for sample selection due to missing outcomes based on the inverse of the conditional selection probability. The latter might either be related to observables, which implies a missing at random assumption, or in addition also to unobservables, if an instrument for sample selection is available. Effects are then estimated for the total population, see Huber and Solovyeva (2018) for further details.
Value
A medweight object contains two components, results
and ntrimmed
:
results
: a 3X5 matrix containing the effect estimates in the first row ("effects"), standard errors in the second row ("se"), and p-values in the third row ("p-value").
The first column provides the total effect, namely the average treatment effect (ATE) if ATET=FALSE
or the average treatment effect on the treated (ATET) if ATET=TRUE
.
The second and third columns provide the direct effects under treatment and control, respectively ("dir.treat", "dir.control"). See equation (6) if w=NULL
(no post-treatment confounders) and equation (13) if w
is defined, respectively, in Huber (2014). If w=NULL
, the fourth and fifth columns provide the indirect effects under treatment and control, respectively ("indir.treat", "indir.control"), see equation (7) in Huber (2014).
If w
is defined, the fourth and fifth columns provide the partial indirect effects under treatment and control, respectively ("par.in.treat", "par.in.control"), see equation (14) in Huber (2014).
ntrimmed
: number of discarded (trimmed) observations due to extreme propensity score values.
References
Huber, M. (2014): "Identifying causal mechanisms (primarily) based on inverse probability weighting", Journal of Applied Econometrics, 29, 920-943.
Huber, M. and Solovyeva, A. (2018): "Direct and indirect effects under sample selection and outcome attrition ", SES working paper 496, University of Fribourg.
Examples
# A little example with simulated data (10000 observations)
## Not run:
n=10000
x=rnorm(n)
d=(0.25*x+rnorm(n)>0)*1
w=0.2*d+0.25*x+rnorm(n)
m=0.5*w+0.5*d+0.25*x+rnorm(n)
y=0.5*d+m+w+0.25*x+rnorm(n)
# The true direct and partial indirect effects are all equal to 0.5
output=medweight(y=y,d=d,m=m,x=x,w=w,trim=0.05,ATET=FALSE,logit=TRUE,boot=19)
round(output$results,3)
output$ntrimmed
## End(Not run)