simSynthData {PSinference}R Documentation

Plug-in Sampling Single Synthetic Dataset Generation

Description

This function is used to generate a single synthetic version of the original data via Plug-in Sampling.

Usage

simSynthData(X, n_imp = dim(X)[1])

Arguments

X

matrix or dataframe

n_imp

sample size

Details

Assume that \mathbf{X}=\left(\mathbf{x}_1, \dots, \mathbf{x}_n\right) is the original data, assumed to be normally distributed, we compute \bar{\mathbf{x}} as the sample mean and \hat{\boldsymbol{\Sigma}}=\mathbf{S}/(n-1) as the sample covariance matrix, where \mathbf{S} is the sample Wishart matrix. We generate \mathbf{V}=\left(\mathbf{v}_1, \dots, \mathbf{v}_n\right), by drawing

\mathbf{v}_i\stackrel{i.i.d.}{\sim}N_p(\bar{\mathbf{x}},\hat{\boldsymbol{\Sigma}}).

Value

a matrix of generated dataset

References

Klein, M., Moura, R. and Sinha, B. (2021). Multivariate Normal Inference based on Singly Imputed Synthetic Data under Plug-in Sampling. Sankhya B 83, 273–287.

Examples

library(MASS)
n_sample = 1000
mu=c(0,0,0,0)
Sigma=diag(1,4,4)
# Create original simulated dataset
df_o = mvrnorm(n_sample, mu, Sigma)
# Create singly imputed synthetic dataset
df_s = simSynthData(df_o)
#Estimators synthetic
mean_s <- colMeans(df_s)
S_s <- (t(df_s)- mean_s) %*% t(t(df_s)- mean_s)
# careful about this computation
# mean_o is a column vector and if you are thinking as n X p matrices and
# row vectors you should be aware of this.
print(mean_s)
print(S_s/(dim(df_s)[1]-1))

[Package PSinference version 0.1.0 Index]