medDML {causalweight} | R Documentation |
Causal mediation analysis with double machine learning
Description
Causal mediation analysis (evaluation of natural direct and indirect effects) for a binary treatment and one or several mediators using double machine learning to control for confounders based on (doubly robust) efficient score functions for potential outcomes.
Usage
medDML(
y,
d,
m,
x,
k = 3,
trim = 0.05,
order = 1,
multmed = TRUE,
fewsplits = FALSE,
normalized = TRUE
)
Arguments
y |
Dependent variable, must not contain missings. |
d |
Treatment, must be binary (either 1 or 0), must not contain missings. |
m |
Mediator, must not contain missings. May be a scalar or a vector of binary, categorical, or continuous variables if |
x |
(Potential) pre-treatment confounders of the treatment, mediator, and/or outcome, must not contain missings. |
k |
Number of folds in k-fold cross-fitting if |
trim |
Trimming rule for discarding observations with extreme conditional treatment or mediator probabilities (or products thereof). Observations with (products of) conditional probabilities that are smaller than |
order |
If set to an integer larger than 1, then polynomials of that order and interactions (using the power series) rather than the original control variables are used in the estimation of any conditional probability or conditional mean outcome. Polynomials/interactions are created using the |
multmed |
If set to |
fewsplits |
If set to |
normalized |
If set to |
Details
Estimation of causal mechanisms (natural direct and indirect effects) of a treatment under selection on observables, assuming that all confounders of the binary treatment and the mediator, the treatment and the outcome, or the mediator and the outcome are observed and not affected by the treatment. Estimation is based on the (doubly robust) efficient score functions for potential outcomes, see Tchetgen Tchetgen and Shpitser (2012) and Farbmacher, Huber, Langen, and Spindler (2019),
as well as on double machine learning with cross-fitting, see Chernozhukov et al (2018). To this end, one part of the data is used for estimating the model parameters of the treatment, mediator, and outcome equations based on post-lasso regression, using the rlasso
and rlassologit
functions (for conditional means and probabilities, respectively) of the hdm
package with default settings. The other part of the data is used for predicting the efficient score functions. The roles of the data parts are swapped and the direct and indirect effects are estimated based on averaging the predicted efficient score functions in the total sample.
Standard errors are based on asymptotic approximations using the estimated variance of the (estimated) efficient score functions.
Value
A medDML object contains two components, results
and ntrimmed
:
results
: a 3X6 matrix containing the effect estimates in the first row ("effects"), standard errors in the second row ("se"), and p-values in the third row ("p-value").
The first column provides the total effect, namely the average treatment effect (ATE). The second and third columns provide the direct effects under treatment and control, respectively ("dir.treat", "dir.control"). The fourth and fifth columns provide the indirect effects under treatment and control, respectively ("indir.treat", "indir.control"). The sixth column provides the estimated mean under non-treatment ("Y(0,M(0))").
ntrimmed
: number of discarded (trimmed) observations due to extreme conditional probabilities.
References
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J. (2018): "Double/debiased machine learning for treatment and structural parameters", The Econometrics Journal, 21, C1-C68.
Farbmacher, H., Huber, M., Laffers, L., Langen, H., and Spindler, M. (2019): "Causal mediation analysis with double machine learning", working paper, University of Fribourg.
Tchetgen Tchetgen, E. J., and Shpitser, I. (2012): "Semiparametric theory for causal mediation analysis: efficiency bounds, multiple robustness, and sensitivity analysis", The Annals of Statistics, 40, 1816-1845.
Tibshirani, R. (1996): "Regression shrinkage and selection via the lasso", Journal of the Royal Statistical Society: Series B, 58, 267-288.
Examples
# A little example with simulated data (10000 observations)
## Not run:
n=10000 # sample size
p=100 # number of covariates
s=2 # number of covariates that are confounders
x=matrix(rnorm(n*p),ncol=p) # covariate matrix
beta=c(rep(0.25,s), rep(0,p-s)) # coefficients determining degree of confounding
d=(x%*%beta+rnorm(n)>0)*1 # treatment equation
m=(x%*%beta+0.5*d+rnorm(n)>0)*1 # mediator equation
y=x%*%beta+0.5*d+m+rnorm(n) # outcome equation
# The true direct effects are equal to 0.5, the indirect effects equal to 0.19
output=medDML(y=y,d=d,m=m,x=x)
round(output$results,3)
output$ntrimmed
## End(Not run)