data_sim {CIMTx}R Documentation

Simulate data for binary outcome with multiple treatments

Description

The function data_sim simulate data for binary outcome with multiple treatments. Users can adjust the following 7 design factors: (1) sample size, (2) ratio of units across treatment groups, (3) whether the treatment assignment model and the outcome generating model are linear or nonlinear, (4) whether the covariates that best predict the treatment also predict the outcome well, (5) whether the response surfaces are parallel across treatment groups, (6) outcome prevalence, and (7) degree of covariate overlap.

Usage

data_sim(
  sample_size,
  n_trt,
  X,
  lp_y,
  nlp_y,
  align = TRUE,
  tau,
  delta,
  psi,
  lp_w,
  nlp_w
)

Arguments

sample_size

A numeric value indicating the total number of units.

n_trt

A numeric value indicating the number of treatments.

X

A vector of characters representing covariates, with each covariate being generated from the standard probability distributions in the stats package.

lp_y

A vector of characters of length n_trt, representing the linear effects in the outcome generating model.

nlp_y

A vector of characters of length n_trt, representing the nonlinear effects in the outcome generating model.

align

A logical indicating whether the predictors in the treatment assignment model are the same as the predictors for the outcome generating model. The default is TRUE. If the argument is set to FALSE, users need to specify additional two arguments lp_w and nlp_w.

tau

A numeric vector of length n_trt inducing different outcome event probabilities across treatment groups.

delta

A numeric vector of length n_trt-1 inducing different ratio of units across treatment groups.

psi

A numeric value for the parameter governing the sparsity of covariate overlap.

lp_w

is a vector of characters of length n_trt - 1, representing in the treatment assignment model

nlp_w

is a vector of characters of length n_trt - 1, representing in the treatment assignment model

Value

A list with 7 elements for simulated data. It contains

covariates:

X matrix

w:

treatment indicators

y:

observed binary outcomes

y_prev:

outcome prevalence rates

ratio_of_units:

the proportions of units in each treatment group

overlap_fig:

the visualization of covariate overlap via boxplots of the distributions of true GPS

Y_true:

simulated true outcome in each treatment group

References

Hu, L., Ji, J. (2021). CIMTx: An R package for causal inference with multiple treatments using observational data. arXiv:2110.10276

Examples

library(CIMTx)
lp_w_all <-
 c(".4*x1 + .1*x2  - .1*x4 + .1*x5",    # w = 1
   ".2 * x1 + .2 * x2  - .2 * x4 - .3 * x5")  # w = 2
nlp_w_all <-
 c("-.5*x1*x4  - .1*x2*x5", # w = 1
   "-.3*x1*x4 + .2*x2*x5")# w = 2
lp_y_all <- rep(".2*x1 + .3*x2 - .1*x3 - .1*x4 - .2*x5", 3)
nlp_y_all <- rep(".7*x1*x1  - .1*x2*x3", 3)
X_all <- c(
 "rnorm(300, 0, 0.5)",# x1
 "rbeta(300, 2, .4)",   # x2
 "runif(300, 0, 0.5)",# x3
 "rweibull(300,1,2)",  # x4
 "rbinom(300, 1, .4)"# x5
)
set.seed(111111)
data <- data_sim(
 sample_size = 300,
 n_trt = 3,
 X = X_all,
 lp_y = lp_y_all,
 nlp_y  = nlp_y_all,
 align = FALSE,
 lp_w = lp_w_all,
 nlp_w = nlp_w_all,
 tau = c(-1.5,0,1.5),
 delta = c(0.5,0.5),
 psi = 1
)

[Package CIMTx version 1.1.0 Index]