STE_internal {CausalMetaR}R Documentation

Estimating the Subgroup Treatment Effect (STE) in an internal target population using multi-source data

Description

Doubly-robust and efficient estimator for the STE in each internal target population using multi-source data.

Usage

STE_internal(
  X,
  Y,
  EM,
  S,
  A,
  cross_fitting = FALSE,
  replications = 10L,
  source_model = "MN.glmnet",
  source_model_args = list(),
  treatment_model_type = "separate",
  treatment_model_args = list(),
  outcome_model_args = list(),
  show_progress = TRUE
)

Arguments

X

Data frame (or matrix) containing the covariate data in the multi-source data. It should have n rows and p columns. Character variables will be converted to factors.

Y

Vector of length n containing the outcome.

EM

Vector of length n containing the effect modifier in the multi-source data. If EM is a factor, it will maintain its subgroup level order; otherwise it will be converted to a factor with default level order.

S

Vector of length n containing the source indicator. If S is a factor, it will maintain its level order; otherwise it will be converted to a factor with the default level order. The order will be carried over to the outputs and plots.

A

Vector of length n containing the binary treatment (1 for treated and 0 for untreated).

cross_fitting

Logical specifying whether sample splitting and cross fitting should be used.

replications

Integer specifying the number of sample splitting and cross fitting replications to perform, if cross_fitting = TRUE. The default is 10L.

source_model

Character string specifying the (penalized) multinomial logistic regression for estimating the source model. It has two options: "MN.glmnet" (default) and "MN.nnet", which use glmnet and nnet respectively.

source_model_args

List specifying the arguments for the source model (in glmnet or nnet).

treatment_model_type

Character string specifying how the treatment model is estimated. Options include "separate" (default) and "joint". If "separate", the treatment model (i.e., P(A=1|X, S=s)) is estimated by regressing A on X within each specific internal population S=s. If "joint", the treatment model is estimated by regressing A on X and S using the multi-source population.

treatment_model_args

List specifying the arguments for the treatment model (in SuperLearner).

outcome_model_args

List specifying the arguments for the outcome model (in SuperLearner).

show_progress

Logical specifying whether to print a progress bar for the cross-fit replicates completed, if cross_fitting = TRUE.

Details

Data structure:

The multi-source dataset consists the outcome Y, source S, treatment A, covariates X (n \times p), and effect modifier EM in the internal populations. The data sources can be trials, observational studies, or a combination of both.

Estimation of nuissance parameters:

The following models are fit:

The models are estimated by SuperLearner with the exception of the source model which is estimated by glmnet or nnet.

STE estimation:

The estimator is

\dfrac{\widehat \kappa}{n}\sum\limits_{i=1}^{n} \Bigg[ I(S_i = s, \widetilde X_i=\widetilde x) \widehat \mu_a(X_i) +I(A_i = a, \widetilde X_i=\widetilde x) \dfrac{\widehat q_{s}(X_i)}{\widehat \eta_a(X_i)} \Big\{ Y_i - \widehat \mu_a(X_i) \Big\} \Bigg],

where \widehat \kappa=\{n^{-1} \sum_{i=1}^n I(S_i=s, \widetilde X_i=\widetilde x)\}^{-1} and \widetilde X denotes the effect modifier.

The estimator is doubly robust and non-parametrically efficient. To achieve non-parametric efficiency and asymptotic normality, it requires that ||\widehat \mu_a(X) -\mu_a(X)||\big\{||\widehat \eta_a(X) -\eta_a(X)||+||\widehat q_s(X) -q_s(X)||\big\}=o_p(n^{-1/2}). In addition, sample splitting and cross-fitting can be performed to avoid the Donsker class assumption.

When a data source is a randomized trial, it is still recommended to estimate the propensity score for optimal efficiency.

Value

An object of class "STE_internal". This object is a list with the following elements:

df_dif

A data frame containing the subgroup treatment effect (mean difference) estimates for the internal populations.

df_A0

A data frame containing the subgroup potential outcome mean estimates under A = 0 for the internal populations.

df_A1

A data frame containing the subgroup potential outcome mean estimates under A = 1 for the internal populations.

fit_outcome

Fitted outcome model.

fit_source

Fitted source model.

fit_treatment

Fitted treatment model(s).

...

Some additional elements.

Examples


si <- STE_internal(
  X = dat_multisource[, 2:10],
  Y = dat_multisource$Y,
  EM = dat_multisource$EM,
  S = dat_multisource$S,
  A = dat_multisource$A,
  cross_fitting = FALSE,
  source_model = "MN.nnet",
  source_model_args = list(),
  treatment_model_type = "separate",
  treatment_model_args = list(
    family = binomial(),
    SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
    cvControl = list(V = 5L)
  ),
  outcome_model_args = list(
    family = gaussian(),
    SL.library = c("SL.glmnet", "SL.nnet", "SL.glm"),
    cvControl = list(V = 5L)
  )
)



[Package CausalMetaR version 0.1.1 Index]