getData {carat}R Documentation

Data Generation

Description

Generates continuous or binary outcomes given patients' covariates, the underlying model and the randomization procedure.

Usage

  getData(n, cov_num, level_num, pr, type, beta, 
          mu1, mu2, sigma = 1, method = "HuHuCAR", ...)

Arguments

n

the number of patients.

cov_num

the number of covariates.

level_num

a vector of level numbers for each covariate. Hence the length of level_num should be equal to the number of covariates.

pr

a vector of probabilities. Under the assumption of independence between covariates, pr is a vector containing probabilities for each level of each covariate. The length of pr should correspond to the number of all levels, and the sum of the probabilities for each margin should be 1.

type

a data-generating method. Optional input: "linear" or "logit".

beta

a vector of coefficients of covariates. The length of beta must correspond to the sum of all covariates' levels.

mu1, mu2

main effects of treatment 1 and treatment 2.

sigma

the error variance for the linear model. The default is 1. This should be a positive value and is only used when type = linear.

method

the randomization procedure to be used for generating randomization sequences. This package provides data-generating function for "HuHuCAR", "PocSimMIN", "StrBCD", "StrPBR", "AdjBCD", and "DoptBCD".

...

arguments to be passed to method. These arguments depend on the randomization method used and the following arguments are accepted:

omega

a vector of weights at the overall, within-stratum, and within-covariate-margin levels. It is required that at least one element is larger than 0. Note that omega is only needed when HuHuCAR is to be used.

weight

a vector of weights for within-covariate-margin imbalances. It is required that at least one element is larger than 0. Note that weight is only needed when PocSimMIN is to be used.

p

the biased coin probability. p should be larger than 1/2 and less than 1. Note that p is only needed when "HuHuCAR", "PocSimMIN" and "StrBCD" are to be used.

a

a design parameter governing the degree of randomness. Note that a is only needed when "AdjBCD" is to be used.

bsize

the block size for stratified randomization. It is required to be a multiple of 2. Note that bsize is only needed when "StrPBR" is to be used.

Details

To generate continuous outcomes, we use the linear model:

y_i = \mu_j+x_i^T\beta+\epsilon_i,

to generate binary outcomes, we use the logit link function:

P(y_i=1) = \frac{exp\{\mu_j+x_i^T\beta \}}{1+exp \{\mu_j+x_i^T\beta }

,

where j indicates patient i belongs to treatment j.

Value

getData returns a size cov_num+2 \times n dataframe. The first cov_num rows represent patients' profile. The next row consists of patients' assignments and the final row consists of generated outcomes.

Examples

#Parameters' Setting
set.seed(100)
n = 1000
cov_num = 5
level_num = c(2,2,2,2,2)
beta = c(1,4,3,2,5,5,4,3,2,1)
mu1 = 0
mu2 = 0
sigma = 1
type = "linear"
p = 0.85
omega = c(0.1, 0.1, rep(0.8 / 5, times = 5))
pr = rep(0.5,10)

#Data Generation
dataH = getData(n, cov_num,level_num, pr, type, beta,
                mu1, mu2, sigma, "HuHuCAR", omega, p)
dataH[1:(cov_num+2),1:5]

[Package carat version 2.2.1 Index]