dyntreatDML {causalweight}R Documentation

Dynamic treatment effect evaluation with double machine learning

Description

Dynamic treatment effect estimation for assessing the average effects of sequences of treatments (consisting of two sequential treatments). Combines estimation based on (doubly robust) efficient score functions with double machine learning to control for confounders in a data-driven way.

Usage

dyntreatDML(
  y2,
  d1,
  d2,
  x0,
  x1,
  s = NULL,
  d1treat = 1,
  d2treat = 1,
  d1control = 0,
  d2control = 0,
  trim = 0.01,
  MLmethod = "lasso",
  fewsplits = FALSE,
  normalized = TRUE
)

Arguments

y2

Dependent variable in the second period (=outcome period), must not contain missings.

d1

Treatment in the first period, must be discrete, must not contain missings.

d2

Treatment in the second period, must be discrete, must not contain missings.

x0

Covariates in the baseline period (prior to the treatment in the first period), must not contain missings.

x1

Covariates in the first period (prior to the treatment in the second period), must not contain missings.

s

Indicator function for defining a subpopulation for whom the treatment effect is estimated as a function of the subpopulation's distribution of x0. Default is NULL (estimation of the treatment effect in the total population).

d1treat

Value of the first treatment in the treatment sequence. Default is 1.

d2treat

Value of the second treatment in the treatment sequence. Default is 1.

d1control

Value of the first treatment in the control sequence. Default is 0.

d2control

Value of the second treatment in the control sequence. Default is 0.

trim

Trimming rule for discarding observations with products of treatment propensity scores in the first and second period that are smaller than trim (to avoid too small denominators in weighting by the inverse of the propensity scores). Default is 0.01.

MLmethod

Machine learning method for estimating the nuisance parameters based on the SuperLearner package. Must be either "lasso" (default) for lasso estimation, "randomforest" for random forests, "xgboost" for xg boosting, "svm" for support vector machines, "ensemble" for using an ensemble algorithm based on all previously mentioned machine learners, or "parametric" for linear or logit regression.

fewsplits

If set to TRUE, the same training data are used for estimating a nested model of conditional mean outcomes, namely E[E[y2|d1,d2,x0,x1]|d1,x0]. If fewsplits is FALSE, the training data are split for the sequential estimation of the nested model. Default of fewsplits is FALSE.

normalized

If set to TRUE, then the inverse probability-based weights are normalized such that they add up to 1 within treatment groups. Default is TRUE.

Details

Estimation of the causal effects of sequences of two treatments under sequential conditional independence, assuming that all confounders of the treatment in either period and the outcome of interest are observed. Estimation is based on the (doubly robust) efficient score functions for potential outcomes, see e.g. Bodory, Huber, and Laffers (2020), in combination with double machine learning with cross-fitting, see Chernozhukov et al (2018). To this end, one part of the data is used for estimating the model parameters of the treatment and outcome equations based machine learning. The other part of the data is used for predicting the efficient score functions. The roles of the data parts are swapped (using 3-fold cross-fitting) and the average dynamic treatment effect is estimated based on averaging the predicted efficient score functions in the total sample. Standard errors are based on asymptotic approximations using the estimated variance of the (estimated) efficient score functions.

Value

A dyntreatDML object contains ten components, effect, se, pval, ntrimmed, meantreat, meancontrol, psd1treat, psd2treat, psd1control, and psd2control :

effect: estimate of the average effect of the treatment sequence.

se: standard error of the effect estimate.

pval: p-value of the effect estimate.

ntrimmed: number of discarded (trimmed) observations due to low products of propensity scores.

meantreat: Estimate of the mean potential outcome under the treatment sequence.

meancontrol: Estimate of the mean potential outcome under the control sequence.

psd1treat: P-score estimates for first treatment in treatment sequence.

psd2treat: P-score estimates for second treatment in treatment sequence.

psd1control: P-score estimates for first treatment in control sequence.

psd2control: P-score estimates for second treatment in control sequence.

References

Bodory, H., Huber, M., Laffers, L. (2020): "Evaluating (weighted) dynamic treatment effects by double machine learning", working paper, arXiv preprint arXiv:2012.00370.

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J. (2018): "Double/debiased machine learning for treatment and structural parameters", The Econometrics Journal, 21, C1-C68.

van der Laan, M., Polley, E., Hubbard, A. (2007): "Super Learner", Statistical Applications in Genetics and Molecular Biology, 6.

Examples

# A little example with simulated data (2000 observations)
## Not run: 
n=2000
# sample size
p0=10
# number of covariates at baseline
s0=5
# number of covariates that are confounders at baseline
p1=10
# number of additional covariates in period 1
s1=5
# number of additional covariates that are confounders in period 1
x0=matrix(rnorm(n*p0),ncol=p0)
# covariate matrix at baseline
beta0=c(rep(0.25,s0), rep(0,p0-s0))
# coefficients determining degree of confounding for baseline covariates
d1=(x0%*%beta0+rnorm(n)>0)*1
# equation of first treatment in period 1
x1=matrix(rnorm(n*p1),ncol=p1)+matrix(0.1 * d1, nrow = n, ncol = p1)
# covariate matrix for covariates of period 1 (affected by 1st treatment d1)
beta1=c(rep(0.25,s1), rep(0,p1-s1))
# coefficients determining degree of confounding for covariates of period 1
d2=(x0%*%beta0+x1%*%beta1+0.5*d1+rnorm(n)>0)*1
# equation of second treatment in period 2
y2=x0%*%beta0+x1%*%beta1+1*d1+0.5*d2+rnorm(n)
# outcome equation in period 2
output=dyntreatDML(y2=y2,d1=d1,d2=d2,x0=x0,x1=x1,
       d1treat=1,d2treat=1,d1control=0,d2control=0)
cat("dynamic ATE: ",round(c(output$effect),3),", standard error: ",
    round(c(output$se),3), ", p-value: ",round(c(output$pval),3))
output$ntrimmed
# The true effect of the treatment sequence is 1.5
## End(Not run)

[Package causalweight version 1.1.0 Index]