R: Estimating the Subgroup Treatment Effect (STE) in an external...

STE_external {CausalMetaR}

R Documentation

Estimating the Subgroup Treatment Effect (STE) in an external target population using multi-source data

Description

Doubly-robust and efficient estimator for the STE in an external target population using multi-source data.

Usage

STE_external(
  X,
  X_external,
  EM,
  EM_external,
  Y,
  S,
  A,
  cross_fitting = FALSE,
  replications = 10L,
  source_model = "MN.glmnet",
  source_model_args = list(),
  treatment_model_type = "separate",
  treatment_model_args = list(),
  external_model_args = list(),
  outcome_model_args = list(),
  show_progress = TRUE
)

Arguments

`X`	Data frame (or matrix) containing the covariate data in the multi-source data. It should have `n` rows and `p` columns. Character variables will be converted to factors.
`X_external`	Data frame (or matrix) containing the covariate data in the external target population. It should have `n_0` rows and `p` columns. This is the external data counterpart to the `X` argument.
`EM`	Vector of length `n` containing the effect modifier in the multi-source data. If `EM` is a factor, it will maintain its subgroup level order; otherwise it will be converted to a factor with default level order.
`EM_external`	Vector of length `n_0` containing the effect modifier in the external data. This is the external data counterpart to the `EM` argument.
`Y`	Vector of length `n` containing the outcome.
`S`	Vector of length `n` containing the source indicator. If `S` is a factor, it will maintain its level order; otherwise it will be converted to a factor with the default level order. The order will be carried over to the outputs and plots.
`A`	Vector of length `n` containing the binary treatment (1 for treated and 0 for untreated).
`cross_fitting`	Logical specifying whether sample splitting and cross fitting should be used.
`replications`	Integer specifying the number of sample splitting and cross fitting replications to perform, if `cross_fitting = TRUE`. The default is `10L`.
`source_model`	Character string specifying the (penalized) multinomial logistic regression for estimating the source model. It has two options: "`MN.glmnet`" (default) and "`MN.nnet`", which use glmnet and nnet respectively.
`source_model_args`	List specifying the arguments for the source model (in glmnet or nnet).
`treatment_model_type`	Character string specifying how the treatment model is estimated. Options include "`separate`" (default) and "`joint`". If "`separate`", the treatment model (i.e., `P(A=1\|X, S=s)`) is estimated by regressing `A` on `X` within each specific internal population `S=s`. If "`joint`", the treatment model is estimated by regressing `A` on `X` and `S` using the multi-source population.
`treatment_model_args`	List specifying the arguments for the treatment model (in SuperLearner).
`external_model_args`	List specifying the arguments for the external model (in SuperLearner).
`outcome_model_args`	List specifying the arguments for the outcome model (in SuperLearner).
`show_progress`	Logical specifying whether to print a progress bar for the cross-fit replicates completed, if `cross_fitting = TRUE`.

Details

Data structure:

The multi-source dataset consists the outcome Y, source S, treatment A, covariates X (n \times p), and effect modifier EM in the internal populations. The data sources can be trials, observational studies, or a combination of both.

The external dataset contains only covariates X_external (n_0 \times p) and the effect modifier EM_external.

Estimation of nuissance parameters:

The following models are fit:

External model: q(X)=P(R=1|X), where R takes value 1 if the subject belongs to any of the internal dataset and 0 if the subject belongs to the external dataset
Propensity score model: \eta_a(X)=P(A=a|X). We perform the decomposition P(A=a|X)=\sum_{s} P(A=a|X, S=s)P(S=s|X) and estimate P(A=1|X, S=s) (i.e., the treatment model) and P(S=s|X) (i.e., the source model).
Outcome model: \mu_a(X)=E(Y|X, A=a)

The models are estimated by SuperLearner with the exception of the source model which is estimated by glmnet or nnet.

STE estimation:

The estimator is

\dfrac{\widehat \kappa}{N}\sum\limits_{i=1}^{N} \Bigg[ I(R_i = 0) \widehat \mu_a(X_i) +I(A_i = a, R_i=1) \dfrac{1-\widehat q(X_i)}{\widehat \eta_a(X_i)\widehat q(X_i)} \Big\{ Y_i - \widehat \mu_a(X_i) \Big\} \Bigg],

where N=n+n_0, \widehat \kappa=\{N^{-1} \sum_{i=1}^N I(R_i=0)\}^{-1}, and and \widetilde X denotes the effect modifier.

The estimator is doubly robust and non-parametrically efficient. To achieve non-parametric efficiency and asymptotic normality, it requires that ||\widehat \mu_a(X) -\mu_a(X)||\big\{||\widehat \eta_a(X) -\eta_a(X)||+||\widehat q(X) -q(X)||\big\}=o_p(n^{-1/2}). In addition, sample splitting and cross-fitting can be performed to avoid the Donsker class assumption.

When a data source is a randomized trial, it is still recommended to estimate the propensity score for optimal efficiency.

Value

An object of class "STE_external". This object is a list with the following elements:

`df_dif`	A data frame containing the subgroup treatment effect (mean difference) estimates for the extenal data.
`df_A0`	A data frame containing the subgroup potential outcome mean estimates under A = 0 for the extenal data.
`df_A1`	A data frame containing the subgroup potential outcome mean estimates under A = 1 for the extenal data.
`fit_outcome`	Fitted outcome model.
`fit_source`	Fitted source model.
`fit_treatment`	Fitted treatment model(s).
`fit_external`	Fitted external model.

References

Wang, G., Levis, A., Steingrimsson, J. and Dahabreh, I. (2024) Efficient estimation of subgroup treatment effects using multi-source data, arXiv preprint arXiv:2402.02684.

Wang, G., McGrath, S., Lian, Y. and Dahabreh, I. (2024) CausalMetaR: An R package for performing causally interpretable meta-analyses, arXiv preprint arXiv:2402.04341.

Examples


se <- STE_external(
  X = dat_multisource[, 2:10],
  Y = dat_multisource$Y,
  EM = dat_multisource$EM,
  S = dat_multisource$S,
  A = dat_multisource$A,
  X_external = dat_external[, 2:10],
  EM_external = dat_external$EM,
  cross_fitting = FALSE,
  source_model = "MN.nnet",
  source_model_args = list(),
  treatment_model_type = "separate",
  treatment_model_args = list(
    family = binomial(),
    SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
    cvControl = list(V = 5L)
  ),
  external_model_args = list(
    family = binomial(),
    SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
    cvControl = list(V = 5L)
  ),
  outcome_model_args = list(
    family = gaussian(),
    SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
    cvControl = list(V = 5L)
  )
)

[Package CausalMetaR version 0.1.2 Index]