data_sim {CIMTx}R Documentation

Simulate data for binary outcome with multiple treatments

Description

The function data_sim simulate data for binary outcome with multiple treatments. Users can adjust the following 7 design factors: (1) sample size, (2) ratio of units across treatment groups, (3) whether the treatment assignment model and the outcome generating model are linear or nonlinear, (4) whether the covariates that best predict the treatment also predict the outcome well, (5) whether the response surfaces are parallel across treatment groups, (6) outcome prevalence, and (7) degree of covariate overlap.

Usage

data_sim(
  sample_size,
  n_trt = 3,
  x = "rnorm(0, 1)",
  lp_y = rep("x1", 3),
  nlp_y = NULL,
  align = TRUE,
  tau = c(0, 0, 0),
  delta = c(0, 0),
  psi = 1,
  lp_w,
  nlp_w
)

Arguments

sample_size

A numeric value indicating the total number of units.

n_trt

A numeric value indicating the number of treatments. The default is set to 3.

x

A vector of characters representing covariates, with each covariate being generated from the standard probability. The default is set to "rnorm(0, 1)". distributions in the stats package.

lp_y

A vector of characters of length n_trt, representing the linear effects in the outcome generating model. The default is set to rep("x1", 3).

nlp_y

A vector of characters of length n_trt, representing the nonlinear effects in the outcome generating model. The default is set to NULL.

align

A logical indicating whether the predictors in the treatment assignment model are the same as the predictors for the outcome generating model. The default is TRUE. If the argument is set to FALSE, users need to specify additional two arguments lp_w and nlp_w.

tau

A numeric vector of length n_trt inducing different outcome event probabilities across treatment groups. Higher values mean higher outcome event probability for the treatment group; lower values mean lower outcome event probability for the treatment group. The default is set to c(0, 0, 0), which corresponds to an approximately equal outcome event probability across three treatment groups.

delta

A numeric vector of length n_trt-1 inducing different ratio of units across treatment groups. Higher values mean higher proportion for the treatment group; lower values mean lower proportion for the treatment group. The default is set to c(0,0), which corresponds to an approximately equal sample sizes across three treatment groups.

psi

A numeric value for the parameter governing the sparsity of covariate overlap. Higher values mean weaker covariate overlap; lower values mean stronger covariate overlap. The default is set to 1, which corresponds to a moderate covariate overlap.

lp_w

is a vector of characters of length n_trt - 1, representing in the treatment assignment model

nlp_w

is a vector of characters of length n_trt - 1, representing in the treatment assignment model

Value

A list with 7 elements for simulated data. It contains

covariates:

x matrix

w:

treatment indicators

y:

observed binary outcomes

y_prev:

outcome prevalence rates

ratio_of_units:

the proportions of units in each treatment group

overlap_fig:

the visualization of covariate overlap via boxplots of the distributions of true GPS

y_true:

simulated true outcome in each treatment group

References

Hadley Wickham (2019). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.0. URL:https://CRAN.R-project.org/package=stringr

Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2021). dplyr: A Grammar of Data Manipulation. R package version 1.0.7. URL: https://CRAN.R-project.org/package=dplyr

Examples

library(CIMTx)
lp_w_all <-
  c(
    ".4*x1 + .1*x2  - .1*x4 + .1*x5", # w = 1
    ".2 * x1 + .2 * x2  - .2 * x4 - .3 * x5"
  ) # w = 2
nlp_w_all <-
  c(
    "-.5*x1*x4  - .1*x2*x5", # w = 1
    "-.3*x1*x4 + .2*x2*x5"
  ) # w = 2
lp_y_all <- rep(".2*x1 + .3*x2 - .1*x3 - .1*x4 - .2*x5", 3)
nlp_y_all <- rep(".7*x1*x1  - .1*x2*x3", 3)
X_all <- c(
  "rnorm(0, 0.5)", # x1
  "rbeta(2,0.4)", # x2
  "runif(0, 0.5)", # x3
  "rweibull(1,2)", # x4
  "rbinom(1,0.4)" # x5
)

set.seed(111111)
data <- data_sim(
  sample_size = 300,
  n_trt = 3,
  x = X_all,
  lp_y = lp_y_all,
  nlp_y = nlp_y_all,
  align = FALSE,
  lp_w = lp_w_all,
  nlp_w = nlp_w_all,
  tau = c(-1.5, 0, 1.5),
  delta = c(0.5, 0.5),
  psi = 1
)

[Package CIMTx version 1.2.0 Index]