R: Continuous Difference-in-Differences using Double Machine...

didcontDMLpanel {causalweight}

R Documentation

Continuous Difference-in-Differences using Double Machine Learning for Panel Data

Description

This function estimates the average treatment effect on the treated of a continuously distributed treatment in panel data based on a Difference-in-Differences (DiD) approach using double machine learning to control for time-varying confounders in a data-driven manner. It supports estimation under various machine learning methods and uses k-fold cross-fitting.

Usage

didcontDMLpanel(
  ydiff,
  d,
  t,
  dtreat,
  dcontrol,
  t1 = 1,
  controls,
  MLmethod = "lasso",
  psmethod = 1,
  trim = 0.1,
  lognorm = FALSE,
  bw = NULL,
  bwfactor = 0.7,
  cluster = NULL,
  k = 3
)

Arguments

`ydiff`	Outcome difference between two pre- and post-treatment periods. Should not contain missing values.
`d`	Treatment variable in the treatment period of interest. Should be continuous and not contain missing values.
`t`	Time variable indicating outcome periods. Should not contain missing values.
`dtreat`	Value of the treatment under treatment (in the treatment period of interest). This value would be 1 for binary treatments.
`dcontrol`	Value of the treatment under control (in the treatment period of interest). This value would be 0 for binary treatments.
`t1`	Value indicating the post-treatment outcome period in which the effect is evaluated, which is the later of the two periods used to generate the outcome difference in `ydiff`. For instance, if the pre-treatment outcome is measured in period 0 and the post-treatment outcome is measured in period 1 to generate `ydiff`, then `t1` is equal to 1. Default is 1.
`controls`	Covariates and/or previous treatment history to be controlled for. Should not contain missing values.
`MLmethod`	Machine learning method for estimating nuisance parameters using the `SuperLearner` package. Must be one of `"lasso"` (default), `"randomforest"`, `"xgboost"`, `"svm"`, `"ensemble"`, or `"parametric"`.
`psmethod`	Method for computing generalized propensity scores. Set to 1 for estimating conditional treatment densities using the treatment as dependent variable, or 2 for using the treatment kernel weights as dependent variable. Default is 1.
`trim`	Trimming threshold (in percentage) for discarding observations with too much influence within any subgroup defined by the treatment group and time. Default is 0.1.
`lognorm`	Logical indicating if log-normal transformation should be applied when estimating conditional treatment densities using the treatment as dependent variable. Default is FALSE.
`bw`	Bandwidth for kernel density estimation. Default is NULL, implying that the bandwidth is calculated based on the rule-of-thumb.
`bwfactor`	Factor by which the bandwidth is multiplied. Default is 0.7 (undersmoothing).
`cluster`	Optional clustering variable for calculating standard errors.
`k`	Number of folds in k-fold cross-fitting. Default is 3.

Details

This function estimates the Average Treatment Effect on the Treated (ATET) by Difference-in-Differences in panel data while controlling for confounders using double machine learning. The function supports different machine learning methods for estimating nuisance parameters and performs k-fold cross-fitting to improve estimation accuracy. The function also handles binary and continuous outcomes, and provides options for trimming and bandwidth adjustments in kernel density estimation.

Value

A list with the following components:

ATET: Estimate of the Average Treatment Effect on the Treated.

se: Standard error of the ATET estimate.

trimmed: Number of discarded (trimmed) observations.

pval: P-value.

pscores: Propensity scores (2 columns): under treatment, under control.

outcomepred: Conditional outcome predictions.

References

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J. (2018): "Double/debiased machine learning for treatment and structural parameters", The Econometrics Journal, 21, C1-C68.

Haddad, M., Huber, M., Medina-Reyes, J., Zhang, L. (2024): "Difference-in-Differences under time-varying continuous treatments based on double machine learning"

Examples

## Not run: 
# Example with simulated data
n=1000
x=0.5*rnorm(n)
u=runif(n,0,2)
d=x+u+rnorm(n)
y0=u+rnorm(n)
y1=2*d+x+u+rnorm(n)
t=rep(1,n)
# true effect is 2
results=didcontDMLpanel(ydiff=y1-y0, d=d, t=t, dtreat=1, dcontrol=0, controls=x, MLmethod="lasso")
cat("ATET: ", round(results$ATET, 3), ", Standard error: ", round(results$se, 3))

## End(Not run)

[Package causalweight version 1.1.1 Index]