identificationDML {causalweight} | R Documentation |
Testing identification with double machine learning
Description
Testing identification with double machine learning
Usage
identificationDML(
y,
d,
x,
z,
score = "DR",
bootstrap = FALSE,
ztreat = 1,
zcontrol = 0,
seed = 123,
MLmethod = "lasso",
k = 3,
DR_parameters = list(s = NULL, normalized = TRUE, trim = 0.01),
squared_parameters = list(zeta_sigma = min(0.5, 500/dim(y)[1])),
bootstrap_parameters = list(B = 2000, importance = 0.95, alpha = 0.1, share = 0.5)
)
Arguments
y |
Dependent variable, must not contain missings. |
d |
Treatment variable, must be discrete, must not contain missings. |
x |
Covariates, must not contain missings. |
z |
Instrument, must not contain missings. |
score |
Orthogonal score used for testing identification, either |
bootstrap |
If set to |
ztreat |
Value of the instrument in the "treatment" group. Default is 1. |
zcontrol |
Value of the instrument in the "control" group. Default is 0. |
seed |
Default is 123. |
MLmethod |
Machine learning method for estimating the nuisance parameters based on the |
k |
Number of folds in k-fold cross-fitting. Default is 3. |
DR_parameters |
List of input parameters to test identification using the doubly robust score:
s: Indicator function for defining a subpopulation for which the treatment effect is estimated as a function of the subpopulation's distribution of |
squared_parameters |
List of input parameters to test identification using the squared deviation: zeta_sigma: standard deviation of the normal distributed errors to avoid degenerated limit distribution. Default is min(0.05,500/n). |
bootstrap_parameters |
List of input parameters to test identification using the DR score and sample splitting to detect heterogeneity (if |
Details
Testing the identification of causal effects of a treatment d
on an outcome y
in observational data using a supposed instrument z
and controlling for observed covariates x
.
Value
An identificationDML
object contains different parameters, at least the two following:
effect
: estimate of the target parameter(s).
pval
: p-value(s) of the identification test.
References
Huber, M., & Kueck, J. (2022): Testing the identification of causal effects in observational data. arXiv:2203.15890.
Examples
# Two examples with simulated data
## Not run:
set.seed(777)
n <- 20000 # sample size
p <- 50 # number of covariates
s <- 5 # sparsity (relevant covariates)
alpha <- 0.1 # level
control violation of identification
delta <- 2 # effect of unobservable in outcome on index of treatment - either 0 or 2
gamma <- 0 # direct effect of the instrument on outcome - either 0 or 0.1
DGP - general
xcorr <- 1 # if 1, then non-zero covariance between regressors
if (xcorr == 0) {
sigmax <- diag(1,p)} # covariate matrix at baseline
if (xcorr != 0){
sigmax = matrix(NA,p,p)
for (i in 1:p){
for (j in 1:p){
sigmax[i,j] = 0.5^(abs(i-j))
}
}}
sparse = FALSE # if FALSE, an approximate sparse setting is considered
beta = rep(0,p)
if (sparse == TRUE){
for (j in 1:s){ beta[j] <- 1} }
if (sparse != TRUE){
for (j in 1:p) beta[j] <- (1/j)}
noise_U <- 0.1 # control signal-to-noise
noise_V <- 0.1
noise_W <- 0.25
x <- (rmvnorm(n,rep(0,p),sigmax))
w <- rnorm(n,0,sd=noise_W)
z <- 1*(rnorm(n)>0)
d <- (x%*%beta+z+w+rnorm(n,0,sd=noise_V)>0)*1 # treatment equation
DGP 1 - effect homogeneity
y <- x%*%beta+d+gamma*z+delta*w+rnorm(n,0,sd=noise_U)
output1 <- identificationDML(y = y, d=d, x=x, z=z, score = "DR", bootstrap = FALSE,
ztreat = 1, zcontrol = 0 , seed = 123, MLmethod ="lasso", k = 3,
DR_parameters = list(s = NULL , normalized = TRUE, trim = 0.01))
output1$pval
output2 <- identificationDML(y=y, d=d, x=x, z=z, score = "squared", bootstrap = FALSE,
ztreat = 1, zcontrol =0 , seed = 123, MLmethod ="lasso", k = 3)
output2$pval
output3 <- identificationDML(y=y, d=d, x=x, z=z, score = "squared", bootstrap = TRUE,
ztreat = 1, zcontrol =0 , seed = 123, MLmethod ="lasso", k = 3,
DR_parameters = list(s = NULL , normalized = TRUE, trim = 0.005),
bootstrap_parameters = list(B = 2000, importance = 0.95, alpha = 0.1, share = 0.5))
output3$pval
DGP 2 - effect heterogeneity
y = x%*%beta+d+gamma*z*x[,1]+gamma*z*x[,2]+delta*w*x[,1]+delta*w*x[,2]+rnorm(n/2,0,sd=noise_U)
output1 <- identificationDML(y = y, d=d, x=x, z=z, score = "DR", bootstrap = FALSE,
ztreat = 1, zcontrol = 0 , seed = 123, MLmethod ="lasso", k = 3,
DR_parameters = list(s = NULL , normalized = TRUE, trim = 0.01))
output1$pval
output2 <- identificationDML(y=y, d=d, x=x, z=z, score = "squared", bootstrap = FALSE,
ztreat = 1, zcontrol =0 , seed = 123, MLmethod ="lasso", k = 3)
output2$pval
output3 <- identificationDML(y=y, d=d, x=x, z=z, score = "DR", bootstrap = TRUE,
ztreat = 1, zcontrol =0 , seed = 123, MLmethod ="lasso", k = 3,
DR_parameters = list(s = NULL , normalized = TRUE, trim = 0.005),
bootstrap_parameters = list(B = 2000, importance = 0.95, alpha = 0.1, share = 0.5))
output3$pval
## End(Not run)