R: Data Generation

getData {carat}

R Documentation

Data Generation

Description

Generates continuous or binary outcomes given patients' covariates, the underlying model and the randomization procedure.

Usage

  getData(n, cov_num, level_num, pr, type, beta, 
          mu1, mu2, sigma = 1, method = "HuHuCAR", ...)

Arguments

`n`	the number of patients.
`cov_num`	the number of covariates.
`level_num`	a vector of level numbers for each covariate. Hence the length of `level_num` should be equal to the number of covariates.
`pr`	a vector of probabilities. Under the assumption of independence between covariates, `pr` is a vector containing probabilities for each level of each covariate. The length of `pr` should correspond to the number of all levels, and the sum of the probabilities for each margin should be `1`.
`type`	a data-generating method. Optional input: `"linear"` or `"logit"`.
`beta`	a vector of coefficients of covariates. The length of `beta` must correspond to the sum of all covariates' levels.
`mu1`, `mu2`	main effects of treatment `1` and treatment `2`.
`sigma`	the error variance for the linear model. The default is 1. This should be a positive value and is only used when `type = linear`.
`method`	the randomization procedure to be used for generating randomization sequences. This package provides data-generating function for `"HuHuCAR"`, `"PocSimMIN"`, `"StrBCD"`, `"StrPBR"`, `"AdjBCD"`, and `"DoptBCD"`.
`...`	arguments to be passed to `method`. These arguments depend on the randomization method used and the following arguments are accepted: omega a vector of weights at the overall, within-stratum, and within-covariate-margin levels. It is required that at least one element is larger than 0. Note that `omega` is only needed when `HuHuCAR` is to be used. weight a vector of weights for within-covariate-margin imbalances. It is required that at least one element is larger than 0. Note that `weight` is only needed when `PocSimMIN` is to be used. p the biased coin probability. `p` should be larger than `1/2` and less than `1`. Note that `p` is only needed when `"HuHuCAR", "PocSimMIN"` and `"StrBCD"` are to be used. a a design parameter governing the degree of randomness. Note that `a` is only needed when `"AdjBCD"` is to be used. bsize the block size for stratified randomization. It is required to be a multiple of 2. Note that `bsize` is only needed when `"StrPBR"` is to be used.

Details

To generate continuous outcomes, we use the linear model:

y_i = \mu_j+x_i^T\beta+\epsilon_i,

to generate binary outcomes, we use the logit link function:

P(y_i=1) = \frac{exp\{\mu_j+x_i^T\beta \}}{1+exp \{\mu_j+x_i^T\beta }

where j indicates patient i belongs to treatment j.

Value

getData returns a size cov_num+2 \times n dataframe. The first cov_num rows represent patients' profile. The next row consists of patients' assignments and the final row consists of generated outcomes.

Examples

#Parameters' Setting
set.seed(100)
n = 1000
cov_num = 5
level_num = c(2,2,2,2,2)
beta = c(1,4,3,2,5,5,4,3,2,1)
mu1 = 0
mu2 = 0
sigma = 1
type = "linear"
p = 0.85
omega = c(0.1, 0.1, rep(0.8 / 5, times = 5))
pr = rep(0.5,10)

#Data Generation
dataH = getData(n, cov_num,level_num, pr, type, beta,
                mu1, mu2, sigma, "HuHuCAR", omega, p)
dataH[1:(cov_num+2),1:5]

[Package carat version 2.2.1 Index]