STE_external {CausalMetaR} | R Documentation |
Estimating the Subgroup Treatment Effect (STE) in an external target population using multi-source data
Description
Doubly-robust and efficient estimator for the STE in an external target population using multi-source data.
Usage
STE_external(
X,
X_external,
EM,
EM_external,
Y,
S,
A,
cross_fitting = FALSE,
replications = 10L,
source_model = "MN.glmnet",
source_model_args = list(),
treatment_model_type = "separate",
treatment_model_args = list(),
external_model_args = list(),
outcome_model_args = list(),
show_progress = TRUE
)
Arguments
X |
Data frame (or matrix) containing the covariate data in the multi-source data. It should have |
X_external |
Data frame (or matrix) containing the covariate data in the external target population. It should have |
EM |
Vector of length |
EM_external |
Vector of length |
Y |
Vector of length |
S |
Vector of length |
A |
Vector of length |
cross_fitting |
Logical specifying whether sample splitting and cross fitting should be used. |
replications |
Integer specifying the number of sample splitting and cross fitting replications to perform, if |
source_model |
Character string specifying the (penalized) multinomial logistic regression for estimating the source model. It has two options: " |
source_model_args |
List specifying the arguments for the source model (in glmnet or nnet). |
treatment_model_type |
Character string specifying how the treatment model is estimated. Options include " |
treatment_model_args |
List specifying the arguments for the treatment model (in SuperLearner). |
external_model_args |
List specifying the arguments for the external model (in SuperLearner). |
outcome_model_args |
List specifying the arguments for the outcome model (in SuperLearner). |
show_progress |
Logical specifying whether to print a progress bar for the cross-fit replicates completed, if |
Details
Data structure:
The multi-source dataset consists the outcome Y
, source S
, treatment A
, covariates X
(n \times p
), and effect modifier EM
in the internal populations. The data sources can be trials, observational studies, or a combination of both.
The external dataset contains only covariates X_external
(n_0 \times p
) and the effect modifier EM_external
.
Estimation of nuissance parameters:
The following models are fit:
External model:
q(X)=P(R=1|X)
, whereR
takes value 1 if the subject belongs to any of the internal dataset and 0 if the subject belongs to the external datasetPropensity score model:
\eta_a(X)=P(A=a|X)
. We perform the decompositionP(A=a|X)=\sum_{s} P(A=a|X, S=s)P(S=s|X)
and estimateP(A=1|X, S=s)
(i.e., the treatment model) andP(S=s|X)
(i.e., the source model).Outcome model:
\mu_a(X)=E(Y|X, A=a)
The models are estimated by SuperLearner with the exception of the source model which is estimated by glmnet or nnet.
STE estimation:
The estimator is
\dfrac{\widehat \kappa}{N}\sum\limits_{i=1}^{N} \Bigg[ I(R_i = 0) \widehat \mu_a(X_i)
+I(A_i = a, R_i=1) \dfrac{1-\widehat q(X_i)}{\widehat \eta_a(X_i)\widehat q(X_i)} \Big\{ Y_i - \widehat \mu_a(X_i) \Big\} \Bigg],
where N=n+n_0
, \widehat \kappa=\{N^{-1} \sum_{i=1}^N I(R_i=0)\}^{-1}
, and and \widetilde X
denotes the effect modifier.
The estimator is doubly robust and non-parametrically efficient. To achieve non-parametric efficiency and asymptotic normality, it requires that ||\widehat \mu_a(X) -\mu_a(X)||\big\{||\widehat \eta_a(X) -\eta_a(X)||+||\widehat q(X) -q(X)||\big\}=o_p(n^{-1/2})
.
In addition, sample splitting and cross-fitting can be performed to avoid the Donsker class assumption.
When a data source is a randomized trial, it is still recommended to estimate the propensity score for optimal efficiency.
Value
An object of class "STE_external". This object is a list with the following elements:
df_dif |
A data frame containing the subgroup treatment effect (mean difference) estimates for the extenal data. |
df_A0 |
A data frame containing the subgroup potential outcome mean estimates under A = 0 for the extenal data. |
df_A1 |
A data frame containing the subgroup potential outcome mean estimates under A = 1 for the extenal data. |
fit_outcome |
Fitted outcome model. |
fit_source |
Fitted source model. |
fit_treatment |
Fitted treatment model(s). |
fit_external |
Fitted external model. |
References
Wang, G., Levis, A., Steingrimsson, J. and Dahabreh, I. (2024) Efficient estimation of subgroup treatment effects using multi-source data, arXiv preprint arXiv:2402.02684.
Wang, G., McGrath, S., Lian, Y. and Dahabreh, I. (2024) CausalMetaR: An R package for performing causally interpretable meta-analyses, arXiv preprint arXiv:2402.04341.
Examples
se <- STE_external(
X = dat_multisource[, 2:10],
Y = dat_multisource$Y,
EM = dat_multisource$EM,
S = dat_multisource$S,
A = dat_multisource$A,
X_external = dat_external[, 2:10],
EM_external = dat_external$EM,
cross_fitting = FALSE,
source_model = "MN.nnet",
source_model_args = list(),
treatment_model_type = "separate",
treatment_model_args = list(
family = binomial(),
SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
cvControl = list(V = 5L)
),
external_model_args = list(
family = binomial(),
SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
cvControl = list(V = 5L)
),
outcome_model_args = list(
family = gaussian(),
SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
cvControl = list(V = 5L)
)
)