proxy_BCA {GenericML}R Documentation

Baseline Conditional Average

Description

Proxy estimation of the Baseline Conditional Average (BCA), defined by E[Y | D=0, Z]. Estimation is done on the auxiliary sample, but BCA predictions are made for all observations.

Usage

proxy_BCA(Z, D, Y, A_set, learner, min_variation = 1e-05)

Arguments

Z

A numeric design matrix that holds the covariates in its columns.

D

A binary vector of treatment assignment. Value one denotes assignment to the treatment group and value zero assignment to the control group.

Y

A numeric vector containing the response variable.

A_set

A numerical vector of the indices of the observations in the auxiliary sample.

learner

A string specifying the machine learner for the estimation. Either 'lasso', 'random_forest', 'tree', or a custom learner specified with mlr3 syntax. In the latter case, do not specify in the mlr3 syntax specification if the learner is a regression learner or classification learner. Example: 'mlr3::lrn("ranger", num.trees = 100)' for a random forest learner with 100 trees. Note that this is a string and the absence of the classif. or regr. keywords. See https://mlr3learners.mlr-org.com for a list of mlr3 learners.

min_variation

Specifies a threshold for the minimum variation of the predictions. If the variation of a BCA prediction falls below this threshold, random noise with distribution N(0, var(Y)/20) is added to it. Default is 1e-05.

Details

The specifications "lasso", "random_forest", and "tree" in learner correspond to the following mlr3 specifications (we omit the keywords classif. and regr.). "lasso" is a cross-validated Lasso estimator, which corresponds to 'mlr3::lrn("cv_glmnet", s = "lambda.min", alpha = 1)'. "random_forest" is a random forest with 500 trees, which corresponds to 'mlr3::lrn("ranger", num.trees = 500)'. "tree" is a tree learner, which corresponds to 'mlr3::lrn("rpart")'.

Value

An object of class "proxy_BCA", consisting of the following components:

estimates

A numeric vector of BCA estimates of each observation.

mlr3_objects

"mlr3" objects used for estimation.

References

Chernozhukov V., Demirer M., Duflo E., Fernández-Val I. (2020). “Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments.” arXiv preprint arXiv:1712.04802. URL: https://arxiv.org/abs/1712.04802.

Lang M., Binder M., Richter J., Schratz P., Pfisterer F., Coors S., Au Q., Casalicchio G., Kotthoff L., Bischl B. (2019). “mlr3: A Modern Object-Oriented Machine Learning Framework in R.” Journal of Open Source Software, 4(44), 1903. doi: 10.21105/joss.01903.

See Also

proxy_CATE()

Examples

if(require("ranger")){
## generate data
set.seed(1)
n  <- 150                                  # number of observations
p  <- 5                                    # number of covariates
D  <- rbinom(n, 1, 0.5)                    # random treatment assignment
Z  <- matrix(runif(n*p), n, p)             # design matrix
Y0 <- as.numeric(Z %*% rexp(p) + rnorm(n)) # potential outcome without treatment
Y1 <- 2 + Y0                               # potential outcome under treatment
Y  <- ifelse(D == 1, Y1, Y0)               # observed outcome
A_set <- sample(1:n, size = n/2)           # auxiliary set

## BCA predictions via random forest
proxy_BCA(Z, D, Y, A_set, learner = "mlr3::lrn('ranger', num.trees = 10)")
}


[Package GenericML version 0.2.2 Index]