R: Generate synthetic data

runm {unmconf}

R Documentation

Generate synthetic data

Description

runm() generates synthetic data for use of modeling with unmeasured confounders. Defaults to the case of one unmeasured confounder present and fixed parameter values. Can be customized. Currently set up to have at most two unmeasured confounders to pair with unm_glm().

Usage

runm(
  n,
  type = "int",
  missing_prop = 0.8,
  response = "bin",
  response_param = NULL,
  response_model_coefs = c(int = -1, z1 = 0.5, z2 = 0.5, z3 = 0.5, u1 = 0.5, x = 0.5),
  treatment_model_coefs = c(int = -1, z1 = 0.5, z2 = 0.5, z3 = 0.5, u1 = 0.5),
  covariate_fam_list = list("norm", "bin", "norm"),
  covariate_param_list = list(c(mean = 0, sd = 1), prob = 0.3, c(0, 2)),
  unmeasured_fam_list = list("norm"),
  unmeasured_param_list = list(c(mean = 0, sd = 1))
)

Arguments

`n`	Number of observations. When `type = "int"`, `n` is a vector of length 1. When `type = "ext"`, `n` can either be a vector of length 1 or 2. For the case when `n` is of length 2, `⁠n = (n_main, n_external)⁠`, where `n_main` corresponds to the main study sample size and `n_external` corresponds to the external validation sample size. For the case when `n` is of length 1, `n` will be split evenly between main study and external validation observations, with the main study getting the additional observation when `n` is odd.
`type`	Type of validation source. Can be `"int"` for internal validation or `"ext"` for external validation. Defaults to `"int"`.
`missing_prop`	Proportion of missing values for internal validation scenario (i.e., when `type = "int"`).
`response`	`"norm"`, `"bin"`, `"pois"`, or `"gam"`. Defaults to `"bin"`.
`response_param`	Nuisance parameters for response type. For `"norm"`, the default standard deviation is 1. For `"gam"`, the default shape parameter is 2. For `"pois"`, an offset variable is added to the dataset that is uniformly distributed from 1 to 10.
`response_model_coefs`	A named vector of coefficients to generate data from the response model. This must include an intercept (`⁠"int" = ⁠`), a coefficient for each covariate specified, a coefficient for each unmeasured confounder, and a treatment coefficient (`⁠"x" = ⁠`). The coefficients for the covariates and treatment will be denoted with `"beta[.]"` and the unmeasured confounders with `"lambda[.]"`.
`treatment_model_coefs`	A named vector of coefficients to generate data from the treatment model. This must include an intercept (`⁠"int" = ⁠`), a coefficient for each covariate specified, and a coefficient for each unmeasured confounder. The coefficients for the covariates and unmeasured confounders will be denoted with `"eta[.]"`.
`covariate_fam_list`	A list of either `"norm"` or `"bin"`, where the length of the list matches the number of covariates in the model.
`covariate_param_list`	A list of parameters for the respective distributions in `covariate_fam_list`, where the length of the list matches the length of `covariate_fam_list`.
`unmeasured_fam_list`	A list of either `"norm"` or `"bin"`, where the length of the list matches the number of unmeasured confounders in the model. This can be at most a length of 2 to pair with `unm_glm()`.
`unmeasured_param_list`	A list of parameters for the respective distributions in `unmeasured_fam_list`, where the length of the list matches the length of `unmeasured_fam_list`.

Value

A tibble

Examples


runm(100)
runm(n = 100, type = "int", missing_prop = .75)
runm(n = 100, type = "int", missing_prop = .75) |> attr("params")
runm(100, type = "int", response = "norm")
runm(100, type = "int", response = "norm") |> attr("params")
runm(100, type = "int", response = "norm", response_param = 3) |> attr("params")
runm(100, type = "int", response = "gam")
runm(100, type = "int", response = "gam", response_param = 5) |> attr("params")
runm(100, type = "int", missing_prop = .5, response = "pois")

runm(n = 100, type = "ext")
runm(n = 100, type = "ext") |> attr("params")
runm(n = c(10, 10), type = "ext")
runm(100, type = "ext", response = "norm")
runm(100, type = "int", response = "norm", response_param = 3) |> attr("params")
runm(100, type = "ext", response = "gam")
runm(100, type = "ext", response = "pois")

runm(
  n = 100,
  type = "int",
  missing_prop = .80,
  response = "norm",
  response_param = c("si_y" = 2),
  response_model_coefs = c("int" = -1, "z" = .4,
                           "u1" = .75, "u2" = .75, "x" = .75),
  treatment_model_coefs = c("int" = -1, "z" = .4,
                            "u1" = .75, "u2" = .75),
  covariate_fam_list = list("norm"),
  covariate_param_list = list(c(mean = 0, sd = 1)),
  unmeasured_fam_list = list("norm", "bin"),
  unmeasured_param_list = list(c(mean = 0, sd = 1), c(.3))
)

runm(
  n = c(20, 30),
  type = "ext",
  response = "norm",
  response_param = c("si_y" = 2),
  response_model_coefs = c("int" = -1, "z1" = .4, "z2" = .5, "z3" = .4,
                           "u1" = .75, "u2" = .75, "x" = .75),
  treatment_model_coefs = c("int" = -1, "z1" = .4, "z2" = .5, "z3" = .4,
                            "u1" = .75, "u2" = .75),
  covariate_fam_list = list("norm", "bin", "norm"),
  covariate_param_list = list(c(mean = 0, sd = 1), c(.3), c(0, 2)),
  unmeasured_fam_list = list("norm", "bin"),
  unmeasured_param_list = list(c(mean = 0, sd = 1), c(.3))
)

[Package unmconf version 0.1.0 Index]