R: The Hirano and Imbens estimator

hi_est {causaldrf}

R Documentation

The Hirano and Imbens estimator

Description

This function estimates the GPS function and estimates the ADRF. The GPS score is based on different treatment models. The treatment is linearly related to Xs.

Usage

hi_est(Y,
       treat,
       treat_formula,
       outcome_formula,
       data,
       grid_val,
       treat_mod,
       link_function,
       ...)

Arguments

`Y`	is the the name of the outcome variable contained in `data`.
`treat`	is the name of the treatment variable contained in `data`.
`treat_formula`	an object of class "formula" (or one that can be coerced to that class) that regresses `treat` on a linear combination of `X`: a symbolic description of the model to be fitted.
`outcome_formula`	is the formula used for fitting the outcome surface. gps is one of the independent variables to use in the outcome_formula. ie. `Y ~ treat+ I(treat^2) + gps + I(gps^2) + treat * gps` or a variation of this. Use `gps` as the name of the variable representing the gps in `outcome_formula`.
`data`	is a dataframe containing `Y`, `treat`, and `X`.
`grid_val`	contains the treatment values to be evaluated.
`treat_mod`	a description of the error distribution to be used in the model for treatment. Options include: `"Normal"` for normal model, `"LogNormal"` for lognormal model, `"Sqrt"` for square-root transformation to a normal treatment, `"Poisson"` for Poisson model, `"NegBinom"` for negative binomial model, `"Gamma"` for gamma model, `"Binomial"` for binomial model.
`link_function`	For `treat_mod = "Gamma"` (fitted using glm) alternatives are "log" or "inverse". For `treat_mod = "Binomial"` (fitted using glm) alternatives are "logit", "probit", "cauchit", "log" and "cloglog".
`...`	additional arguments to be passed to the outcome lm() function.

Details

Hirano (2004) (HI) introduced this imputation-type method that includes a GPS component. The idea is to fit a parametric observable (outcome) model, which includes the estimated GPS as a covariate, to impute missing potential outcomes.

The method requires several steps. First, a model is used to relate treatment to the recorded covariates. For example, T_i|\textbf{X}_i \sim \mathcal{N}(\textbf{X}_i^T \boldsymbol{\beta}, \sigma^2) and then estimate the \boldsymbol{\beta} parameters. Next, the GPS for each unit is estimated

\hat{R}_i(t) = \frac{1}{\sqrt{2 \pi \hat{\sigma}^2} } e^{-\frac{(t - \textbf{X}_i^T \boldsymbol{\hat{\beta}})^2}{2 \hat{\sigma}^2}}

These GPS estimates are used in the outcome or observable model. The outcome is modeled as a function of T_i and \hat{R}_i parametrically. For example,

E[Y_i | T_i, R_i] = \alpha_0 + \alpha_1 T_i + \alpha_2 T_i^2 + \alpha_3 \hat{R}_i + \alpha_4 \hat{R}_i^2 + \alpha_5 \hat{R}_i\cdot T_i

After collecting the estimated parameters in the outcome and treatment models, plug-in the treatment values into the model to estimate the missing potential outcomes of each individual at that treatment level. For example, if we plug in T_i = t into the estimated models, then each unit will have a potential outcome estimated at treatment level T_i= t.

\hat{Y}_i(t) = \hat{\alpha}_0 + \hat{\alpha}_1 t + \hat{\alpha}_2 t^2 + \hat{\alpha}_3 \hat{R}_i(t) + \hat{\alpha}_4 \hat{R}_i^2(t) + \hat{\alpha}_5 \hat{R}_i(t) \cdot t

The next step is to aggregate these estimated potential outcomes to get an average treatment effect at dose level T_i = t. The mean outcome at dose-level T_i = t is given by:

\hat{\mu}(t) = \frac{1}{N}\sum_i^N \hat{\alpha}_0 + \hat{\alpha}_1 t + \hat{\alpha}_2 t^2 + \hat{\alpha}_3 \hat{R}_i(t) + \hat{\alpha}_4 \hat{R^2}_i(t) + \hat{\alpha}_5 \hat{R}_i(t) \cdot t

Different treatment levels are plugged into the previous equation to estimate the missing potential outcomes. If many t values are evaluated, then it is possible to trace out an ADRF.

Value

hi_est returns an object of class "causaldrf", a list that contains the following components:

`param`	parameter estimates for a hi fit.
`t_mod`	the result of the treatment model fit.
`out_mod`	the result of the outcome model fit.
`call`	the matched call.

References

Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.

Hirano, Keisuke, Imbens, Guido W (2004). The propensity score with continuous treatments. Applied Bayesian modeling and causal inference from incomplete-data perspectives.

Examples

## Example from Schafer (2015).

example_data <- sim_data

hi_list <- hi_est(Y = Y,
              treat = T,
              treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8,
              outcome_formula = Y ~ T + I(T^2) + gps + I(gps^2) + T * gps,
              data = example_data,
              grid_val = seq(8, 16, by = 1),
              treat_mod = "Normal")

sample_index <- sample(1:1000, 100)

plot(example_data$T[sample_index],
      example_data$Y[sample_index],
      xlab = "T",
      ylab = "Y",
      main = "hi estimate")

lines(seq(8, 16, by = 1),
      hi_list$param,
      lty = 2,
      lwd = 2,
      col = "blue")

legend('bottomright',
        "hi estimate",
        lty=2,
        lwd = 2,
        col = "blue",
        bty='Y',
        cex=1)

rm(example_data, hi_list, sample_index)


## Example from van der Wal, Willem M., and Ronald B. Geskus. (2011)
#Simulate data with continuous confounder and outcome, binomial exposure.
#Marginal causal effect of exposure on outcome: 10.
n <- 1000
simdat <- data.frame(l = rnorm(n, 10, 5))
a.lin <- simdat$l - 10
pa <- exp(a.lin)/(1 + exp(a.lin))
simdat$a <- rbinom(n, 1, prob = pa)
simdat$y <- 10*simdat$a + 0.5*simdat$l + rnorm(n, -10, 5)
simdat[1:5,]
temp_hi <- hi_est(Y = y,
                 treat = a,
                 treat_formula = a ~ l,
                 outcome_formula = y ~ gps,
                 data = simdat,
                 grid_val = c(0, 1),
                 treat_mod = "Binomial",
                 link_function = "logit")

temp_hi[[1]]  # estimated coefficients

[Package causaldrf version 0.4.2 Index]