didcontDML {causalweight}R Documentation

Continuous Difference-in-Differences using Double Machine Learning for Repeated Cross-Sections

Description

This function estimates the average treatment effect on the treated of a continuously distributed treatment in repeated cross-sections based on a Difference-in-Differences (DiD) approach using double machine learning to control for time-varying confounders in a data-driven manner. It supports estimation under various machine learning methods and uses k-fold cross-fitting.

Usage

didcontDML(
  y,
  d,
  t,
  dtreat,
  dcontrol,
  t0 = 0,
  t1 = 1,
  controls,
  MLmethod = "lasso",
  psmethod = 1,
  trim = 0.1,
  lognorm = FALSE,
  bw = NULL,
  bwfactor = 0.7,
  cluster = NULL,
  k = 3
)

Arguments

y

Outcome variable. Should not contain missing values.

d

Treatment variable in the treatment period of interest. Should be continuous and not contain missing values.

t

Time variable indicating outcome periods. Should not contain missing values.

dtreat

Value of the treatment under treatment (in the treatment period of interest). This value would be 1 for binary treatments.

dcontrol

Value of the treatment under control (in the treatment period of interest). This value would be 0 for binary treatments.

t0

Value indicating the pre-treatment outcome period. Default is 0.

t1

Value indicating the post-treatment outcome period in which the effect is evaluated. Default is 1.

controls

Covariates and/or previous treatment history to be controlled for. Should not contain missing values.

MLmethod

Machine learning method for estimating nuisance parameters using the SuperLearner package. Must be one of "lasso" (default), "randomforest", "xgboost", "svm", "ensemble", or "parametric".

psmethod

Method for computing generalized propensity scores. Set to 1 for estimating conditional treatment densities using the treatment as dependent variable, or 2 for using the treatment kernel weights as dependent variable. Default is 1.

trim

Trimming threshold (in percentage) for discarding observations with too much influence within any subgroup defined by the treatment group and time. Default is 0.1.

lognorm

Logical indicating if log-normal transformation should be applied when estimating conditional treatment densities using the treatment as dependent variable. Default is FALSE.

bw

Bandwidth for kernel density estimation. Default is NULL, implying that the bandwidth is calculated based on the rule-of-thumb.

bwfactor

Factor by which the bandwidth is multiplied. Default is 0.7 (undersmoothing).

cluster

Optional clustering variable for calculating standard errors.

k

Number of folds in k-fold cross-fitting. Default is 3.

Details

This function estimates the Average Treatment Effect on the Treated (ATET) by Difference-in-Differences in repeated cross-sections while controlling for confounders using double machine learning. The function supports different machine learning methods for estimating nuisance parameters and performs k-fold cross-fitting to improve estimation accuracy. The function also handles binary and continuous outcomes, and provides options for trimming and bandwidth adjustments in kernel density estimation.

Value

A list with the following components:

ATET: Estimate of the Average Treatment Effect on the Treated.

se: Standard error of the ATET estimate.

trimmed: Number of discarded (trimmed) observations.

pval: P-value.

pscores: Propensity scores (4 columns): under treatment in period t1, under treatment in period t0, under control in period t1, under control in period t0.

outcomes: Conditional outcomes (3 columns): in treatment group in period t0, in control group in period t1, in control group in period t0.

References

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J. (2018): "Double/debiased machine learning for treatment and structural parameters", The Econometrics Journal, 21, C1-C68.

Haddad, M., Huber, M., Medina-Reyes, J., Zhang, L. (2024): "Difference-in-Differences under time-varying continuous treatments based on double machine learning"

Examples

## Not run: 
# Example with simulated data
n=2000
t=rep(c(0, 1), each=n/2)
x=0.5*rnorm(n)
u=runif(n,0,2)
d=x+u+rnorm(n)
y=(2*d+x)*t+u+rnorm(n)
# true effect is 2
results=didcontDML(y=y, d=d, t=t, dtreat=1, dcontrol=0, controls=x, MLmethod="lasso")
cat("ATET: ", round(results$ATET, 3), ", Standard error: ", round(results$se, 3))

## End(Not run)

[Package causalweight version 1.1.1 Index]