didDML {causalweight}R Documentation

Difference-in-Differences in Repeated Cross-Sections for Binary Treatments using Double Machine Learning

Description

This function estimates the average treatment effect on the treated (ATET) in the post-treatment period for a binary treatment using a doubly robust Difference-in-Differences (DiD) approach for repeated cross-sections that is combined with double machine learning. It controls for (possibly time-varying) confounders in a data-driven manner and supports various machine learning methods for estimating nuisance parameters through k-fold cross-fitting.

Usage

didDML(
  y,
  d,
  t,
  x,
  MLmethod = "lasso",
  est = "dr",
  trim = 0.05,
  cluster = NULL,
  k = 3
)

Arguments

y

Outcome variable. Should not contain missing values.

d

Treatment group indicator (binary). Should not contain missing values.

t

Time period indicator (binary). Should be 1 for post-treatment period and 0 for pre-treatment period. Should not contain missing values.

x

Covariates to be controlled for. Should not contain missing values.

MLmethod

Machine learning method for estimating nuisance parameters using the SuperLearner package. Must be one of "lasso" (default), "randomforest", "xgboost", "svm", "ensemble", or "parametric".

est

Estimation method. Must be one of "dr" (default) for doubly robust, "ipw" for inverse probability weighting (not doubly robust!), or "reg" for regression (not doubly robust!).

trim

Trimming threshold (in percentage) for discarding observations with too small propensity scores within any subgroup defined by the treatment group and time. Default is 0.05.

cluster

Optional clustering variable for calculating cluster-robust standard errors.

k

Number of folds in k-fold cross-fitting. Default is 3.

Details

This function estimates the Average Treatment Effect on the Treated (ATET) in the post-treatment period based on Difference-in-Differences in repeated cross-sections when controlling for confounders in a data-adaptive manner using double machine learning. The function supports different machine learning methods to estimate nuisance parameters (conditional mean outcomes and propensity scores) as well as cross-fitting to mitigate overfitting. Besides double machine learning, the function also provides inverse probability weighting and regression adjustment methods (which are, however, not doubly robust).

Value

A list with the following components:

ATET: Estimate of the Average Treatment Effect on the Treated (ATET) in the post-treatment period.

se: Standard error of the ATET estimate.

pval: P-value of the ATET estimate.

trimmed: Number of discarded (trimmed) observations.

pscores: Propensity scores (4 columns): under treatment in period 1, under treatment in period 0, under control in period 1, under control in period 0.

outcomepred: Conditional outcome predictions (3 columns): in treatment group in period 0, in control group in period 1, in control group in period 0.

References

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J. (2018): "Double/debiased machine learning for treatment and structural parameters", The Econometrics Journal, 21, C1-C68.

Zimmert, M. (2020): "Efficient difference-in-differences estimation with high-dimensional common trend confounding", arXiv preprint 1809.01643.

Examples

## Not run: 
# Example with simulated data
n=4000                            # sample size
t=1*(rnorm(n)>0)                  # time period
u=runif(n,0,1)                    # time constant unobservable
x= 0.25*t+runif(n,0,1)            # time varying covariate
d=1*(x+u+2*rnorm(n)>0)            # treatment
y=d*t+t+x+u+2*rnorm(n)            # outcome
# true effect is equal to 1
results=didDML(y=y, d=d, t=t, x=x)
cat("ATET: ", round(results$ATET, 3), ", Standard error: ", round(results$se, 3))

## End(Not run)

[Package causalweight version 1.1.1 Index]