getData {carat}R Documentation

Data Generation


Generates continuous or binary outcomes given patients' covariates, the underlying model and the randomization procedure.


  getData(n, cov_num, level_num, pr, type, beta, 
          mu1, mu2, sigma = 1, method = "HuHuCAR", ...)



the number of patients.


the number of covariates.


a vector of level numbers for each covariate. Hence the length of level_num should be equal to the number of covariates.


a vector of probabilities. Under the assumption of independence between covariates, pr is a vector containing probabilities for each level of each covariate. The length of pr should correspond to the number of all levels, and the sum of the probabilities for each margin should be 1.


a data-generating method. Optional input: "linear" or "logit".


a vector of coefficients of covariates. The length of beta must correspond to cov_num.


main effects of treatment 1 and treatment 2.


the error variance for the linear model. The default is 1. This should be a positive value and is only used when type = linear.


the randomization procedure to be used for generating randomization sequences. This package provides data-generating function for "HuHuCAR", "PocSimMIN", "StrBCD", "StrPBR", "AdjBCD", and "DoptBCD".


arguments to be passed to method. These arguments depend on the randomization method used and the following arguments are accepted:


a vector of weights at the overall, within-stratum, and within-covariate-margin levels. It is required that at least one element is larger than 0. Note that omega is only needed when HuHuCAR is to be used.


a vector of weights for within-covariate-margin imbalances. It is required that at least one element is larger than 0. Note that weight is only needed when PocSimMIN is to be used.


the biased coin probability. p should be larger than 1/2 and less than 1. Note that p is only needed when "HuHuCAR", "PocSimMIN" and "StrBCD" are to be used.


a design parameter governing the degree of randomness. Note that a is only needed when "AdjBCD" is to be used.


the block size for stratified randomization. It is required to be a multiple of 2. Note that bsize is only needed when "StrPBR" is to be used.


To generate continuous outcomes, we use the linear model:

y_i = \mu_j+x_i^T\beta+\epsilon_i,

to generate binary outcomes, we use the logit link function:

P(y_i=1) = \frac{exp\{\mu_j+x_i^T\beta \}}{1+exp \{\mu_j+x_i^T\beta }


where j indicates patient i belongs to treatment j.


getData returns a size cov_num+2 \times n dataframe. The first cov_num rows represent patients' profile. The next row consists of patients' assignments and the final row consists of generated outcomes.


#Parameters' Setting
n = 1000
cov_num = 5
level_num = c(2,2,2,2,2)
beta = c(1,4,3,2,5)
mu1 = 0
mu2 = 0
sigma = 1
type = "linear"
p = 0.85
omega = c(0.1, 0.1, rep(0.8 / 5, times = 5))
pr = rep(0.5,10)

#Data Generation
dataH = getData(n, cov_num,level_num, pr, type, beta,
                mu1, mu2, sigma, "HuHuCAR", omega, p)

[Package carat version 2.0.2 Index]