generate_causal_data {grf} | R Documentation |
Generate causal forest data
Description
The following DGPs are available for benchmarking purposes:
"simple": tau = max(X1, 0), e = 0.4 + 0.2 * 1(X1 > 0).
"aw1": equation (27) of https://arxiv.org/pdf/1510.04342.pdf
"aw2": equation (28) of https://arxiv.org/pdf/1510.04342.pdf
"aw3": confounding is from "aw1" and tau is from "aw2"
"aw3reverse": Same as aw3, but HTEs anticorrelated with baseline
"ai1": "Setup 1" from section 6 of https://arxiv.org/pdf/1504.01132.pdf
"ai2": "Setup 2" from section 6 of https://arxiv.org/pdf/1504.01132.pdf
"kunzel": "Simulation 1" from A.1 in https://arxiv.org/pdf/1706.03461.pdf
"nw1": "Setup A" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf
"nw2": "Setup B" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf
"nw3": "Setup C" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf
"nw4": "Setup D" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf
Usage
generate_causal_data(
n,
p,
sigma.m = 1,
sigma.tau = 0.1,
sigma.noise = 1,
dgp = c("simple", "aw1", "aw2", "aw3", "aw3reverse", "ai1", "ai2", "kunzel", "nw1",
"nw2", "nw3", "nw4")
)
Arguments
n |
The number of observations. |
p |
The number of covariates (note: the minimum varies by DGP). |
sigma.m |
The standard deviation of the unconditional mean of Y. Default is 1. |
sigma.tau |
The standard deviation of the treatment effect. Default is 0.1. |
sigma.noise |
The conditional variance of Y. Default is 1. |
dgp |
The kind of dgp. Default is "simple". |
Details
Each DGP is parameterized by X: observables, m: conditional mean of Y, tau: treatment effect, e: propensity scores, V: conditional variance of Y.
The following rescaled data is returned m = m / sd(m) * sigma.m, tau = tau / sd(tau) * sigma.tau, V = V / mean(V) * sigma.noise^2, W = rbinom(e), Y = m + (W - e) * tau + sqrt(V) + rnorm(n).
Value
A list consisting of: X, Y, W, tau, m, e, dgp.
Examples
# Generate simple benchmark data
data <- generate_causal_data(100, 5, dgp = "simple")
# Generate data from Wager and Athey (2018)
data <- generate_causal_data(100, 5, dgp = "aw1")
data2 <- generate_causal_data(100, 5, dgp = "aw2")