proxy_CATE {GenericML}R Documentation

Conditional Average Treatment Effect

Description

Proxy estimation of the Conditional Average Treatment Effect (CATE), defined by E[Y | D=1, Z] - E[Y | D=0, Z]. Estimation is done on the auxiliary sample, but CATE predictions are made for all observations.

Usage

proxy_CATE(Z, D, Y, A_set, learner, proxy_BCA = NULL, min_variation = 1e-05)

Arguments

Z

A numeric design matrix that holds the covariates in its columns.

D

A binary vector of treatment assignment. Value one denotes assignment to the treatment group and value zero assignment to the control group.

Y

A numeric vector containing the response variable.

A_set

A numerical vector of the indices of the observations in the auxiliary sample.

learner

A string specifying the machine learner for the estimation. Either 'lasso', 'random_forest', 'tree', or a custom learner specified with mlr3 syntax. In the latter case, do not specify in the mlr3 syntax specification if the learner is a regression learner or classification learner. Example: 'mlr3::lrn("ranger", num.trees = 100)' for a random forest learner with 100 trees. Note that this is a string and the absence of the classif. or regr. keywords. See https://mlr3learners.mlr-org.com for a list of mlr3 learners.

proxy_BCA

A vector of proxy estimates of the baseline conditional average, BCA, E[Y | D=0, Z]. If NULL, these will be estimated separately.

min_variation

Minimum variation of the predictions before random noise with distribution N(0, var(Y)/20) is added. Default is 1e-05.

Details

The specifications "lasso", "random_forest", and "tree" in learner correspond to the following mlr3 specifications (we omit the keywords classif. and regr.). "lasso" is a cross-validated Lasso estimator, which corresponds to 'mlr3::lrn("cv_glmnet", s = "lambda.min", alpha = 1)'. "random_forest" is a random forest with 500 trees, which corresponds to 'mlr3::lrn("ranger", num.trees = 500)'. "tree" is a tree learner, which corresponds to 'mlr3::lrn("rpart")'.

Value

An object of class "proxy_CATE", consisting of the following components:

estimates

A numeric vector of CATE estimates of each observation.

mlr3_objects

"mlr3" objects used for estimation of E[Y | D=1, Z] (Y1_learner) and E[Y | D=0, Z] (Y0_learner). The latter is not available if proxy_BCA = NULL.

References

Chernozhukov V., Demirer M., Duflo E., Fernández-Val I. (2020). “Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments.” arXiv preprint arXiv:1712.04802. URL: https://arxiv.org/abs/1712.04802.

Lang M., Binder M., Richter J., Schratz P., Pfisterer F., Coors S., Au Q., Casalicchio G., Kotthoff L., Bischl B. (2019). “mlr3: A Modern Object-Oriented Machine Learning Framework in R.” Journal of Open Source Software, 4(44), 1903. doi: 10.21105/joss.01903.

See Also

proxy_BCA()

Examples

if(require("ranger")){
## generate data
set.seed(1)
n  <- 150                                  # number of observations
p  <- 5                                    # number of covariates
D  <- rbinom(n, 1, 0.5)                    # random treatment assignment
Z  <- matrix(runif(n*p), n, p)             # design matrix
Y0 <- as.numeric(Z %*% rexp(p) + rnorm(n)) # potential outcome without treatment
Y1 <- 2 + Y0                               # potential outcome under treatment
Y  <- ifelse(D == 1, Y1, Y0)               # observed outcome
A_set <- sample(1:n, size = n/2)           # auxiliary set

## CATE predictions via random forest
proxy_CATE(Z, D, Y, A_set, learner = "mlr3::lrn('ranger', num.trees = 10)")
}


[Package GenericML version 0.2.2 Index]