Data_Gen {CHEMIST}  R Documentation 
Generation of Artificial Data
Description
This function shows the demonstration of data generation based on some specific and commonly used settings, including exponential family distributed potential outcomes, errorprone treatments, and covariates. In this function, users can specify different magnitudes of measurement error and relationship between outcome, treatment, and covariates.
Usage
Data_Gen(
X,
alpha,
beta,
theta,
a,
sigma_e,
e_distr = "normal",
num_pi,
delta,
linearY,
typeY
)
Arguments
X 
The input of n x p dimensional matrix of true covariates, where n is sample size and p is number of covariates. Users can customize the data structure and distribution. 
alpha 
A vector of the parameters that reflects the relationship between
treatment model and covariates. The dimension of 
beta 
A vector of the parameters that reflects the relationship between
outcome and covariates. The dimension of 
theta 
The scalar of the parameter used to link outcome and treatment. 
a 
A weight of 
sigma_e 

e_distr 
Distribution of the noise term in the classical measurement
error model. The input "normal" refers to the normal distribution with mean
zero and covariance matrix with diagonal entries 
num_pi 
Settings of misclassification probability with option 1 or 2.

delta 
The parameter that determines number of treatment with measurement
error. 
linearY 
The boolean option that determines the relationship between
outcome and covariates. 
typeY 
The outcome variable with exponential family distribution
"binary", "pois" and "cont". 
Value
Data 
A n x (p+2) matrix of the original data without measurement error, where n is sample size and the first p columns are covariates with the order being Xc (the covariates associated with both treatment and outcome), Xp (the covariates associated with outcome only), Xi (the covariates associated with treatment only), Xs (the covariates independent of outcome and treatment), the last second column is treatment, and the last column is outcome. 
Error_Data 
A n x (p+2) matrix of the data with measurement error in covariates and treatment, where n is sample size and the first p columns are covariates with the order being Xc (the covariates associated with both treatment and outcome), Xp (the covariates associated with outcome only), Xi (the covariates associated with treatment only), Xs (the covariates independent of outcome and treatment), the last second column is treatment, and the last column is outcome. 
Pi 
A n x 2 matrix containing two misclassification probabilities pi_10 = P(Observed Treatment = 1  Actual Treatment = 0) and pi_01 = P(Observed Treatment = 0  Actual Treatment = 1) in columns. 
cov_e 
A covariance matrix of the measurement error model. 
Examples
##### Example 1: A multivariate normal continuous X with linear normal Y #####
## Generate a multivariate normal X matrix
mean_x = 0; sig_x = 1; rho = 0
Sigma_x = matrix( rho*sig_x^2,nrow=120 ,ncol=120 )
diag(Sigma_x) = sig_x^2
Mean_x = rep( mean_x, 120 )
X = as.matrix( mvrnorm(n = 60,mu = Mean_x,Sigma = Sigma_x,empirical = FALSE) )
## Data generation setting
## alpha: Xc's scale is 0.2 0.2 and Xi's scale is 0.3 0.3
## so this refers that there is 2 Xc and Xi
## beta: Xc's scale is 2 2 and Xp's scale is 2 2
## so this refers that there is 2 Xc and Xp
## rest with following setup
Data_fun < Data_Gen(X, alpha = c(0.2,0.2,0,0,0.3,0.3), beta = c(2,2,2,2,0,0)
, theta = 2, a = 2, sigma_e = 0.75, e_distr = 10, num_pi = 1, delta = 0.8,
linearY = TRUE, typeY = "cont")
##### Example 2: A uniform X with non linear binary Y #####
## Generate a uniform X matrix
n = 50; p = 120
X = matrix(NA,n,p)
for( i in 1:p ){ X[,i] = sample(runif(n,1,1),n,replace=TRUE ) }
X = scale(X)
## Data generation setting
## alpha: Xc's scale is 0.1 and Xi's scale is 0.3
## so this refers that there is 1 Xc and Xi
## beta: Xc's scale is 2 and Xp's scale is 3
## so this refers that there is 1 Xc and Xp
## rest with following setup
Data_fun < Data_Gen(X, alpha = c(0.1,0,0.3), beta = c(2,3,0)
, theta = 1, a = 2, sigma_e = 0.5, e_distr = "normal", num_pi = 2, delta = 0.5,
linearY = FALSE, typeY = "binary")