GVdist {PSinference} | R Documentation |
Generalized Variance Empirical Distribution
Description
This function calculates the empirical distribution of the pivotal random variable that can be used to perform inferential procedures for the Generalized Variance of the released Single Synthetic dataset generated under Plug-in Sampling, assuming that the original distribution is normally distributed.
Usage
GVdist(nsample, pvariates, iterations = 10000)
Arguments
nsample |
Sample size. |
pvariates |
Number of variables. |
iterations |
Number of iterations for simulating values from the distribution and finding the quantiles. Default is |
Details
We define
where ,
is the population covariance matrix
and
is the
th observation of the synthetic dataset.
Its distribution is stochastic equivalent to
where are all independent chi-square random variables.
The
level confidence interval for
is given by
where is the observed value of
and
is the
th percentile of
.
Value
a vector of length iterations
that recorded the empirical distribution's values.
References
Klein, M., Moura, R. and Sinha, B. (2021). Multivariate Normal Inference based on Singly Imputed Synthetic Data under Plug-in Sampling. Sankhya B 83, 273–287.
Examples
# Original data creation
library(MASS)
mu <- c(1,2,3,4)
Sigma <- matrix(c(1, 0.5, 0.5, 0.5,
0.5, 1, 0.5, 0.5,
0.5, 0.5, 1, 0.5,
0.5, 0.5, 0.5, 1), nrow = 4, ncol = 4, byrow = TRUE)
seed = 1
n_sample = 100
# Create original simulated dataset
df = mvrnorm(n_sample, mu = mu, Sigma = Sigma)
# Synthetic data created
df_s = simSynthData(df)
# Gather the 0.025 and 0.975 quantiles and construct confident interval for sigma^2
# Check that sigma^2 is inside in both cases
p = dim(df_s)[2]
T <- GVdist(100, p, 10000)
q975 <- quantile(T, 0.975)
q025 <- quantile(T, 0.025)
left <- (n_sample-1)^p * det(cov(df_s)*(n_sample-1))/q975
right <- (n_sample-1)^p * det(cov(df_s)*(n_sample-1))/q025
cat(left,right,'\n')
print(det(Sigma))
# The synthetic value is inside the confidence interval of GV