Create.Synthetic {SynDI}R Documentation

Create the synthetic data

Description

Creates a synthetic data set from internal data and external models.

Usage

Create.Synthetic(
  datan,
  nrep,
  Y,
  XB,
  Ytype = "binary",
  parametric,
  betaHatExt_list,
  sigmaHatExt_list = NULL
)

Arguments

datan

internal data only

nrep

number of replication when creating the synthetic data

Y

outcome name, e.g. Y='Y'

XB

all covariate names for both X and B in the target model, e.g. XB=c('X1','X2','X3','X4','B1','B2')

Ytype

the type of outcome Y, either 'binary' or 'continuous'.

parametric

choice of "Yes" or "No" for each external model. Specify whether the external model is paramtric or not, e.g. parametric=c('Yes','No')

betaHatExt_list

a list of parameter estimates of the external models. The order needs to be the same as listed in XB, and variable name is required. See example for details.

sigmaHatExt_list

a list of sigma^2 for continuous outcome fitted from linear regression. If not available or the outcome type is binary, set sigmaHatExt_list=NULL

Value

a data.frame. The combined dataset of the internal data (of size n) and the synthetic data for the given external model (of size n * nrep). This combined dataset contains a total of n*(1+nrep) rows, one intercept column (Int), one outcome column (Y), one indicator column (S), and all the predictors in the internal data. S is the indicator variable, where the internal data is indicated as S=0, and the synthetic data is indicated as S=1. The internal data part is a complete dataset without any missingness. The synthetic data part may contain missingness for certain predictors that were not used in the external model.

References

Reference: Gu, T., Taylor, J.M.G. and Mukherjee, B. (2021) Regression inference for multiple populations by integrating summary-level data using stacked imputations https://arxiv.org/abs/2106.06835.

Examples

data(create_synthetic_example)

nrep = create_synthetic_example$nrep
datan = create_synthetic_example$datan
betaHatExt_list = create_synthetic_example$betaHatExt_list

data.combined = Create.Synthetic(nrep = nrep, datan = datan, Y = 'Y', 
    XB = c('X1', 'X2', 'X3', 'X4', 'B1', 'B2'), Ytype = 'binary', 
    parametric = c('Yes', 'No'), betaHatExt_list = betaHatExt_list, 
    sigmaHatExt_list = NULL)


[Package SynDI version 0.1.0 Index]