hi_est {causaldrf}  R Documentation 
The Hirano and Imbens estimator
Description
This function estimates the GPS function and estimates the ADRF. The GPS score is based on different treatment models. The treatment is linearly related to Xs.
Usage
hi_est(Y,
treat,
treat_formula,
outcome_formula,
data,
grid_val,
treat_mod,
link_function,
...)
Arguments
Y 
is the the name of the outcome variable contained in 
treat 
is the name of the treatment variable contained in

treat_formula 
an object of class "formula" (or one that can be
coerced to that class) that regresses 
outcome_formula 
is the formula used for fitting the outcome surface. gps is one of the independent variables to use in the outcome_formula. ie.
or a variation
of this. Use 
data 
is a dataframe containing 
grid_val 
contains the treatment values to be evaluated. 
treat_mod 
a description of the error distribution to be used in the
model for treatment. Options include: 
link_function 
For 
... 
additional arguments to be passed to the outcome lm() function. 
Details
Hirano (2004) (HI) introduced this imputationtype method that includes a GPS component. The idea is to fit a parametric observable (outcome) model, which includes the estimated GPS as a covariate, to impute missing potential outcomes.
The method requires several steps. First, a model is used to relate
treatment to the recorded covariates. For example,
T_i\textbf{X}_i \sim \mathcal{N}(\textbf{X}_i^T \boldsymbol{\beta}, \sigma^2)
and then estimate the \boldsymbol{\beta}
parameters.
Next, the GPS for each unit is estimated
\hat{R}_i(t) = \frac{1}{\sqrt{2 \pi \hat{\sigma}^2} } e^{\frac{(t  \textbf{X}_i^T \boldsymbol{\hat{\beta}})^2}{2 \hat{\sigma}^2}}
These GPS estimates are used in the outcome or observable model.
The outcome is modeled as a function of T_i
and \hat{R}_i
parametrically. For example,
E[Y_i  T_i, R_i] = \alpha_0 + \alpha_1 T_i + \alpha_2 T_i^2 + \alpha_3 \hat{R}_i + \alpha_4 \hat{R}_i^2 + \alpha_5 \hat{R}_i\cdot T_i
After collecting the estimated parameters in the outcome and treatment
models, plugin the treatment values into the model to estimate the
missing potential outcomes of each individual at that treatment level.
For example, if we plug in T_i = t
into the estimated models, then
each unit will have a potential outcome estimated at treatment level
T_i= t
.
\hat{Y}_i(t) = \hat{\alpha}_0 + \hat{\alpha}_1 t + \hat{\alpha}_2 t^2 + \hat{\alpha}_3 \hat{R}_i(t) + \hat{\alpha}_4 \hat{R}_i^2(t) + \hat{\alpha}_5 \hat{R}_i(t) \cdot t
The next step is to aggregate these estimated potential outcomes
to get an average treatment effect at dose level T_i = t
.
The mean outcome at doselevel T_i = t
is given by:
\hat{\mu}(t) = \frac{1}{N}\sum_i^N \hat{\alpha}_0 + \hat{\alpha}_1 t + \hat{\alpha}_2 t^2 + \hat{\alpha}_3 \hat{R}_i(t) + \hat{\alpha}_4 \hat{R^2}_i(t) + \hat{\alpha}_5 \hat{R}_i(t) \cdot t
Different treatment levels are plugged into the previous equation
to estimate the missing potential outcomes. If many t
values
are evaluated, then it is possible to trace out an ADRF.
Value
hi_est
returns an object of class "causaldrf",
a list that contains the following components:
param 
parameter estimates for a hi fit. 
t_mod 
the result of the treatment model fit. 
out_mod 
the result of the outcome model fit. 
call 
the matched call. 
References
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric doseresponse models. Manuscript in preparation.
Hirano, Keisuke, Imbens, Guido W (2004). The propensity score with continuous treatments. Applied Bayesian modeling and causal inference from incompletedata perspectives.
See Also
nw_est
, iw_est
, hi_est
, gam_est
,
add_spl_est
,
bart_est
, etc. for other estimates.
t_mod
, overlap_fun
to prepare the data
for use in the different estimates.
Examples
## Example from Schafer (2015).
example_data < sim_data
hi_list < hi_est(Y = Y,
treat = T,
treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8,
outcome_formula = Y ~ T + I(T^2) + gps + I(gps^2) + T * gps,
data = example_data,
grid_val = seq(8, 16, by = 1),
treat_mod = "Normal")
sample_index < sample(1:1000, 100)
plot(example_data$T[sample_index],
example_data$Y[sample_index],
xlab = "T",
ylab = "Y",
main = "hi estimate")
lines(seq(8, 16, by = 1),
hi_list$param,
lty = 2,
lwd = 2,
col = "blue")
legend('bottomright',
"hi estimate",
lty=2,
lwd = 2,
col = "blue",
bty='Y',
cex=1)
rm(example_data, hi_list, sample_index)
## Example from van der Wal, Willem M., and Ronald B. Geskus. (2011)
#Simulate data with continuous confounder and outcome, binomial exposure.
#Marginal causal effect of exposure on outcome: 10.
n < 1000
simdat < data.frame(l = rnorm(n, 10, 5))
a.lin < simdat$l  10
pa < exp(a.lin)/(1 + exp(a.lin))
simdat$a < rbinom(n, 1, prob = pa)
simdat$y < 10*simdat$a + 0.5*simdat$l + rnorm(n, 10, 5)
simdat[1:5,]
temp_hi < hi_est(Y = y,
treat = a,
treat_formula = a ~ l,
outcome_formula = y ~ gps,
data = simdat,
grid_val = c(0, 1),
treat_mod = "Binomial",
link_function = "logit")
temp_hi[[1]] # estimated coefficients