| ATE_internal {CausalMetaR} | R Documentation | 
Estimating the Average Treatment Effect (ATE) in an internal target population using multi-source data
Description
Doubly-robust and efficient estimator for the ATE in each internal target population using multi-source data.
Usage
ATE_internal(
  X,
  Y,
  S,
  A,
  cross_fitting = FALSE,
  replications = 10L,
  source_model = "MN.glmnet",
  source_model_args = list(),
  treatment_model_type = "separate",
  treatment_model_args = list(),
  outcome_model_args = list(),
  show_progress = TRUE
)
Arguments
X | 
 Data frame (or matrix) containing the covariate data in the multi-source data. It should have   | 
Y | 
 Vector of length   | 
S | 
 Vector of length   | 
A | 
 Vector of length   | 
cross_fitting | 
 Logical specifying whether sample splitting and cross fitting should be used.  | 
replications | 
 Integer specifying the number of sample splitting and cross fitting replications to perform, if   | 
source_model | 
 Character string specifying the (penalized) multinomial logistic regression for estimating the source model. It has two options: "  | 
source_model_args | 
 List specifying the arguments for the source model (in glmnet or nnet).  | 
treatment_model_type | 
 Character string specifying how the treatment model is estimated. Options include "  | 
treatment_model_args | 
 List specifying the arguments for the treatment model (in SuperLearner).  | 
outcome_model_args | 
 List specifying the arguments for the outcome model (in SuperLearner).  | 
show_progress | 
 Logical specifying whether to print a progress bar for the cross-fit replicates completed, if   | 
Details
Data structure:
The multi-source dataset consists the outcome Y, source S, treatment A, and covariates X (n \times p) in the internal populations. The data sources can be trials, observational studies, or a combination of both.
Estimation of nuissance parameters:
The following models are fit:
Propensity score model:
\eta_a(X)=P(A=a|X). We perform the decompositionP(A=a|X)=\sum_{s} P(A=a|X, S=s)P(S=s|X)and estimateP(A=1|X, S=s)(i.e., the treatment model) andq_s(X)=P(S=s|X)(i.e., the source model).Outcome model:
\mu_a(X)=E(Y|X, A=a)
The models are estimated by SuperLearner with the exception of the source model which is estimated by glmnet or nnet.
ATE estimation:
The ATE estimator is
 \dfrac{\widehat \kappa}{n}\sum\limits_{i=1}^{n} \Bigg[ I(S_i = s) \widehat \mu_a(X_i)
 +I(A_i = a) \dfrac{\widehat q_{s}(X_i)}{\widehat \eta_a(X_i)}  \Big\{ Y_i - \widehat \mu_a(X_i) \Big\} \Bigg],
where \widehat \kappa=\{n^{-1} \sum_{i=1}^n I(S_i=s)\}^{-1}. The estimator is doubly robust and non-parametrically efficient.
To achieve non-parametric efficiency and asymptotic normality, it requires that ||\widehat \mu_a(X) -\mu_a(X)||\big\{||\widehat \eta_a(X) -\eta_a(X)||+||\widehat q_s(X) -q_s(X)||\big\}=o_p(n^{-1/2}).
In addition, sample splitting and cross-fitting can be performed to avoid the Donsker class assumption.
When a data source is a randomized trial, it is still recommended to estimate the propensity score for optimal efficiency.
Value
An object of class "ATE_internal". This object is a list with the following elements:
df_dif | 
 A data frame containing the treatment effect (mean difference) estimates for the internal populations.  | 
df_A0 | 
 A data frame containing the potential outcome mean estimates under A = 0 for the internal populations.  | 
df_A1 | 
 A data frame containing the potential outcome mean estimates under A = 1 for the internal populations.  | 
fit_outcome | 
 Fitted outcome model.  | 
fit_source | 
 Fitted source model.  | 
fit_treatment | 
 Fitted treatment model(s).  | 
References
Robertson, S.E., Steingrimsson, J.A., Joyce, N.R., Stuart, E.A. and Dahabreh, I.J. (2021). Center-specific causal inference with multicenter trials: Reinterpreting trial evidence in the context of each participating center. arXiv preprint arXiv:2104.05905.
Wang, G., McGrath, S., Lian, Y. and Dahabreh, I. (2024) CausalMetaR: An R package for performing causally interpretable meta-analyses, arXiv preprint arXiv:2402.04341.
Examples
ai <- ATE_internal(
  X = dat_multisource[, 1:10],
  Y = dat_multisource$Y,
  S = dat_multisource$S,
  A = dat_multisource$A,
  source_model = "MN.glmnet",
  source_model_args = list(),
  treatment_model_type = "separate",
  treatment_model_args = list(
    family = binomial(),
    SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
    cvControl = list(V = 5L)
  ),
  outcome_model_args = list(
    family = gaussian(),
    SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
    cvControl = list(V = 5L)
  )
)