treatDML {causalweight}R Documentation

Binary or multiple discrete treatment effect evaluation with double machine learning

Description

Treatment effect estimation for assessing the average effects of discrete (multiple or binary) treatments. Combines estimation based on (doubly robust) efficient score functions with double machine learning to control for confounders in a data-driven way.

Usage

treatDML(
  y,
  d,
  x,
  s = NULL,
  dtreat = 1,
  dcontrol = 0,
  trim = 0.01,
  MLmethod = "lasso",
  k = 3,
  normalized = TRUE
)

Arguments

y

Dependent variable, must not contain missings.

d

Treatment variable, must be discrete, must not contain missings.

x

Covariates, must not contain missings.

s

Indicator function for defining a subpopulation for whom the treatment effect is estimated as a function of the subpopulation's distribution of x. Default is NULL (estimation of the average treatment effect in the total population).

dtreat

Value of the treatment in the treatment group. Default is 1.

dcontrol

Value of the treatment in the control group. Default is 0.

trim

Trimming rule for discarding observations with treatment propensity scores that are smaller than trim or larger than 1-trim (to avoid too small denominators in weighting by the inverse of the propensity scores). Default is 0.01.

MLmethod

Machine learning method for estimating the nuisance parameters based on the SuperLearner package. Must be either "lasso" (default) for lasso estimation, "randomforest" for random forests, "xgboost" for xg boosting, "svm" for support vector machines, "ensemble" for using an ensemble algorithm based on all previously mentioned machine learners, or "parametric" for linear or logit regression.

k

Number of folds in k-fold cross-fitting. Default is 3.

normalized

If set to TRUE, then the inverse probability-based weights are normalized such that they add up to 1 within treatment groups. Default is TRUE.

Details

Estimation of the causal effects of binary or multiple discrete treatments under conditional independence, assuming that confounders jointly affecting the treatment and the outcome can be controlled for by observed covariates. Estimation is based on the (doubly robust) efficient score functions for potential outcomes in combination with double machine learning with cross-fitting, see Chernozhukov et al (2018). To this end, one part of the data is used for estimating the model parameters of the treatment and outcome equations based machine learning. The other part of the data is used for predicting the efficient score functions. The roles of the data parts are swapped (using k-fold cross-fitting) and the average treatment effect is estimated based on averaging the predicted efficient score functions in the total sample. Standard errors are based on asymptotic approximations using the estimated variance of the (estimated) efficient score functions.

Value

A treatDML object contains eight components, effect, se, pval, ntrimmed, meantreat, meancontrol, pstreat, and pscontrol:

effect: estimate of the average treatment effect.

se: standard error of the effect.

pval: p-value of the effect estimate.

ntrimmed: number of discarded (trimmed) observations due to extreme propensity scores.

meantreat: Estimate of the mean potential outcome under treatment.

meancontrol: Estimate of the mean potential outcome under control.

pstreat: P-score estimates for treatment in treatment group.

pscontrol: P-score estimates for treatment in control group.

References

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J. (2018): "Double/debiased machine learning for treatment and structural parameters", The Econometrics Journal, 21, C1-C68.

van der Laan, M., Polley, E., Hubbard, A. (2007): "Super Learner", Statistical Applications in Genetics and Molecular Biology, 6.

Examples

# A little example with simulated data (2000 observations)
## Not run: 
n=2000                            # sample size
p=100                             # number of covariates
s=2                               # number of covariates that are confounders
x=matrix(rnorm(n*p),ncol=p)       # covariate matrix
beta=c(rep(0.25,s), rep(0,p-s))   # coefficients determining degree of confounding
d=(x%*%beta+rnorm(n)>0)*1         # treatment equation
y=x%*%beta+0.5*d+rnorm(n)       # outcome equation
# The true ATE is equal to 0.5
output=treatDML(y,d,x)
cat("ATE: ",round(c(output$effect),3),", standard error: ",
    round(c(output$se),3), ", p-value: ",round(c(output$pval),3))
output$ntrimmed
## End(Not run)

[Package causalweight version 1.1.1 Index]