Create.Synthetic {SynDI} | R Documentation |
Create the synthetic data
Description
Creates a synthetic data set from internal data and external models.
Usage
Create.Synthetic(
datan,
nrep,
Y,
XB,
Ytype = "binary",
parametric,
betaHatExt_list,
sigmaHatExt_list = NULL
)
Arguments
datan |
internal data only |
nrep |
number of replication when creating the synthetic data |
Y |
outcome name, e.g. Y='Y' |
XB |
all covariate names for both X and B in the target model, e.g. XB=c('X1','X2','X3','X4','B1','B2') |
Ytype |
the type of outcome Y, either 'binary' or 'continuous'. |
parametric |
choice of "Yes" or "No" for each external model. Specify whether the external model is paramtric or not, e.g. parametric=c('Yes','No') |
betaHatExt_list |
a list of parameter estimates of the external models. The order needs to be the same as listed in XB, and variable name is required. See example for details. |
sigmaHatExt_list |
a list of sigma^2 for continuous outcome fitted from linear regression. If not available or the outcome type is binary, set sigmaHatExt_list=NULL |
Value
a data.frame
. The combined dataset of the internal data (of size n) and the synthetic
data for the given external model (of size n *
nrep). This combined dataset
contains a total of n*(1+nrep) rows, one intercept column (Int), one outcome
column (Y), one indicator column (S), and all the predictors in the internal
data. S is the indicator variable, where the internal data is indicated as S=0,
and the synthetic data is indicated as S=1. The internal data part is a complete
dataset without any missingness. The synthetic data part may contain missingness
for certain predictors that were not used in the external model.
References
Reference: Gu, T., Taylor, J.M.G. and Mukherjee, B. (2021) Regression inference for multiple populations by integrating summary-level data using stacked imputations https://arxiv.org/abs/2106.06835.
Examples
data(create_synthetic_example)
nrep = create_synthetic_example$nrep
datan = create_synthetic_example$datan
betaHatExt_list = create_synthetic_example$betaHatExt_list
data.combined = Create.Synthetic(nrep = nrep, datan = datan, Y = 'Y',
XB = c('X1', 'X2', 'X3', 'X4', 'B1', 'B2'), Ytype = 'binary',
parametric = c('Yes', 'No'), betaHatExt_list = betaHatExt_list,
sigmaHatExt_list = NULL)