simSynthData {PSinference} | R Documentation |
Plug-in Sampling Single Synthetic Dataset Generation
Description
This function is used to generate a single synthetic version of the original data via Plug-in Sampling.
Usage
simSynthData(X, n_imp = dim(X)[1])
Arguments
X |
matrix or dataframe |
n_imp |
sample size |
Details
Assume that is the original data, assumed to be normally distributed,
we compute
as the sample mean and
as the sample covariance matrix,
where
is the sample Wishart matrix.
We generate
, by drawing
Value
a matrix of generated dataset
References
Klein, M., Moura, R. and Sinha, B. (2021). Multivariate Normal Inference based on Singly Imputed Synthetic Data under Plug-in Sampling. Sankhya B 83, 273–287.
Examples
library(MASS)
n_sample = 1000
mu=c(0,0,0,0)
Sigma=diag(1,4,4)
# Create original simulated dataset
df_o = mvrnorm(n_sample, mu, Sigma)
# Create singly imputed synthetic dataset
df_s = simSynthData(df_o)
#Estimators synthetic
mean_s <- colMeans(df_s)
S_s <- (t(df_s)- mean_s) %*% t(t(df_s)- mean_s)
# careful about this computation
# mean_o is a column vector and if you are thinking as n X p matrices and
# row vectors you should be aware of this.
print(mean_s)
print(S_s/(dim(df_s)[1]-1))
[Package PSinference version 0.1.0 Index]