R: Prediction with a residual bias correction estimator

aipwee_est {causaldrf}

R Documentation

Prediction with a residual bias correction estimator

Description

This method combines the regression estimator with a residual bias correction for estimating a parametric ADRF.

Usage

aipwee_est(Y,
           treat,
           covar_formula = ~ 1,
           covar_lin_formula = ~ 1,
           covar_sq_formula = ~ 1,
           data,
           e_treat_1 = NULL,
           e_treat_2 = NULL,
           e_treat_3 = NULL,
           e_treat_4 = NULL,
           degree = 1,
           wt = NULL,
           method = "same",
           spline_df = NULL,
           spline_const = 1,
           spline_linear = 1,
           spline_quad = 1)

Arguments

`Y`	is the the name of the outcome variable contained in `data`.
`treat`	is the name of the treatment variable contained in `data`.
`covar_formula`	is the formula to describe the covariates needed to estimate the constant term: `~ X.1 + ....`. Can include higher order terms or interactions. i.e. `~ X.1 + I(X.1^2) + X.1 * X.2 + ....`. Don't forget the tilde before listing the covariates.
`covar_lin_formula`	is the formula to describe the covariates needed to estimate the linear term, t: `~ X.1 + ....`. Can include higher order terms or interactions. i.e. `~ X.1 + I(X.1^2) + X.1 * X.2 + ....`. Don't forget the tilde before listing the covariates.
`covar_sq_formula`	is the formula to describe the covariates needed to estimate the quadratic term, t^2: `~ X.1 + ....`. Can include higher order terms or interactions. i.e. `~ X.1 + I(X.1^2) + X.1 * X.2 + ....`. Don't forget the tilde before listing the covariates.
`data`	is a dataframe containing `Y`, `treat`, and `X`.
`e_treat_1`	a vector, representing the conditional expectation of `treat` from `T_mod`.
`e_treat_2`	a vector, representing the conditional expectation of `treat^2` from `T_mod`.
`e_treat_3`	a vector, representing the conditional expectation of `treat^3` from `T_mod`.
`e_treat_4`	a vector, representing the conditional expectation of `treat^4` from `T_mod`.
`degree`	is 1 for linear and 2 for quadratic outcome model.
`wt`	is weight used in lsfit for outcome regression. Default is wt = NULL.
`method`	is "same" if the same set of covariates are used to estimate the constant, linear, and/or quadratic term. If method = "different", then different sets of covariates can be used to estimate the constant, linear, and/or quadratic term. covar_lin_formula and covar_sq_formula must be specified if method = "different".
`spline_df`	degrees of freedom. The default, spline_df = NULL, corresponds to no knots.
`spline_const`	is the number of spline terms needed to estimate the constant term.
`spline_linear`	is the number of spline terms needed to estimate the linear term.
`spline_quad`	is the number of spline terms needed to estimate the quadratic term.

Details

This estimator bears a strong resemblance to general regression estimators in the survey literature, part of a more general class of calibration estimators (Deville and Sarndal, 1992). It is doubly robust, which means that it is consistent if either of the models is true (Scharfstein, Rotnitzky and Robins 1999). If the Y-model is correct, then the first term in the previous equation is unbiased for \xi and the second term has mean zero even if the T-model is wrong. If the Y-model is incorrect, the first term is biased, but the second term gives a consistent estimate of (minus one times) the bias from the Y-model if the T-model is correct.

This function is a doubly-robust estimator that fits an outcome regression model with a bias correction term. For details see Schafer and Galagate (2015).

Value

aipwee_est returns an object of class "causaldrf_lsfit", a list that contains the following components:

`param`	parameter estimates for a add_spl fit.
`t_mod`	the result of the treatment model fit.
`out_mod`	the result of the outcome model fit.
`call`	the matched call.

References

Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.

Schafer, Joseph L, Kang, Joseph (2008). Average causal effects from nonrandomized studies: a practical guide and simulated example. Psychological methods, 13.4, 279.

Robins, James M and Rotnitzky, Andrea (1995). Semiparametric efficiency in multivariate regression models with missing data Journal of the American Statistical Association, 90.429, 122–129.

Scharfstein, Daniel O and Rotnitzky, Andrea and Robins, James M (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models Journal of the American Statistical Association, 94.448, 1096–1120.

Deville, Jean-Claude and Sarndal, Carl-Erik (1992). Calibration estimators in survey sampling Journal of the American Statistical Association, 87.418, 376–380.

Examples

## Example from Schafer (2015).

example_data <- sim_data


t_mod_list <- t_mod(treat = T,
              treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8,
              data = example_data,
              treat_mod = "Normal")

cond_exp_data <- t_mod_list$T_data
full_data <- cbind(example_data, cond_exp_data)

aipwee_list <- aipwee_est(Y = Y,
                         treat = T,
                         covar_formula = ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8,
                         covar_lin_formula = ~ 1,
                         covar_sq_formula = ~ 1,
                         data = example_data,
                         e_treat_1 = full_data$est_treat,
                         e_treat_2 = full_data$est_treat_sq,
                         e_treat_3 = full_data$est_treat_cube,
                         e_treat_4 = full_data$est_treat_quartic,
                         degree = 1,
                         wt = NULL,
                         method = "same",
                         spline_df = NULL,
                         spline_const = 1,
                         spline_linear = 1,
                         spline_quad = 1)

sample_index <- sample(1:1000, 100)

plot(example_data$T[sample_index],
      example_data$Y[sample_index],
      xlab = "T",
      ylab = "Y",
      main = "aipwee estimate")

abline(aipwee_list$param[1],
        aipwee_list$param[2],
        lty = 2,
        lwd = 2,
        col = "blue")

legend('bottomright',
        "aipwee estimate",
        lty = 2,
        lwd = 2,
        col = "blue",
        bty='Y',
        cex=1)

rm(example_data, t_mod_list, cond_exp_data, full_data, aipwee_list, sample_index)

[Package causaldrf version 0.4.2 Index]